Question

I mean 100+ MB big; such text files can push the envelope of editors.

I need to look through a large XML file, but cannot if the editor is buggy.

Any suggestions?

Was it helpful?

Solution

VS Code (Windows, macOS, Linux) - Free and Open Source with a nice GUI. Edited a 3.6 GB JSON file, loaded in a minute. You must have enough RAM to load the files.

Free read-only viewers:

  • glogg (Windows, macOS, Linux) – Confirmed to handle multi-GB files. Its main feature is regular expression search. Has tabs, reads files directly from disk, can watch/follow files, and allows user to mark lines.
  • LogExpert (Windows) – "A GUI replacement for tail." Supports file following, searching, filtering, configurable highlighting, plugins, and external tools.
  • Large Text File Viewer (Windows) – Minimalist and has very small executable size. Supports split view, text theme customization, regex search, and file following.
  • Lister (Windows) – Even more small and minimalist. It's one executable, barely 500 KB, but it still supports searching (with regexes), printing, a hex editor mode, and settings.

Free editors:

  • Vim and Emacs (Windows, macOS, Linux) – Classic Unix editors. Steep learning curve, but brutally efficient. They have settings that can be tuned to make them even faster.
  • Large File Editor (Windows) – Opens and edits TB+ files, supports Unicode, uses little memory, has XML-specific features, and includes a binary mode.
  • HxD (Windows) – A hex editor, not a text editor; but it's amazingly fast and useful.
  • GigaEdit (Windows) – Supports searching, character statistics, and font customization. But it's buggy – with large files, it only allows overwriting characters, not inserting them; it doesn't respect LF as a line terminator, only CRLF; and it's slow.

Builtin programs (no installation required):

  • less (macOS, Linux) – The traditional Unix command-line pager tool. Lets you view text files of practically any size. Can be installed on Windows, too.
  • Notepad (Windows) – Decent with large files, especially with word wrap turned off.
  • MORE (Windows) – This refers to the Windows MORE, not the Unix more. A console program that allows you to view a file, one screen at a time.

Web viewers:

  • htmlpen.com – Can open and syntax-highlight TB+ files. Allows editing, except for very large files. Supports searching, regexes, and exporting.
  • readfileonline.com – Another HTML5 large file viewer. Supports search.

Paid editors:

  • 010 Editor (Windows, macOS, Linux) – Opens giant (as much as 50 GB) files.
  • SlickEdit (Windows, macOS, Linux) – Opens large files.
  • UltraEdit (Windows, macOS, Linux) – Opens files of more than 6 GB, but the configuration must be changed for this to be practical: Menu » Advanced » Configuration » File Handling » Temporary Files » Open file without temp file...
  • EmEditor (Windows) – Handles very large text files nicely (officially up to 248 GB, but as much as 900 GB according to one report).

And finally, have you tried opening the large file with your regular editor? Some editors can actually handle reasonably large files. In particular, Notepad++ (Windows) and Sublime Text (Windows, macOS, Linux) support files in the 2 GB range.

OTHER TIPS

Tips and tricks

less

Why are you using editors to just look at a (large) file?

Under *nix or Cygwin, just use less. (There is a famous saying – "less is more, more or less" – because "less" replaced the earlier Unix command "more", with the addition that you could scroll back up.) Searching and navigating under less is very similar to Vim, but there is no swap file and little RAM used.

There is a Win32 port of GNU less. See the "less" section of the answer above.

Perl

Perl is good for quick scripts, and its .. (range flip-flop) operator makes for a nice selection mechanism to limit the crud you have to wade through.

For example:

$ perl -n -e 'print if ( 1000000 .. 2000000)' humongo.txt | less

This will extract everything from line 1 million to line 2 million, and allow you to sift the output manually in less.

Another example:

$ perl -n -e 'print if ( /regex one/ .. /regex two/)' humongo.txt | less

This starts printing when the "regular expression one" finds something, and stops when the "regular expression two" find the end of an interesting block. It may find multiple blocks. Sift the output...

logparser

This is another useful tool you can use. To quote the Wikipedia article:

logparser is a flexible command line utility that was initially written by Gabriele Giuseppini, a Microsoft employee, to automate tests for IIS logging. It was intended for use with the Windows operating system, and was included with the IIS 6.0 Resource Kit Tools. The default behavior of logparser works like a "data processing pipeline", by taking an SQL expression on the command line, and outputting the lines containing matches for the SQL expression.

Microsoft describes Logparser as a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows operating system such as the Event Log, the Registry, the file system, and Active Directory. The results of the input query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart.

Example usage:

C:\>logparser.exe -i:textline -o:tsv "select Index, Text from 'c:\path\to\file.log' where line > 1000 and line < 2000"
C:\>logparser.exe -i:textline -o:tsv "select Index, Text from 'c:\path\to\file.log' where line like '%pattern%'"

The relativity of sizes

100 MB isn't too big. 3 GB is getting kind of big. I used to work at a print & mail facility that created about 2% of U.S. first class mail. One of the systems for which I was the tech lead accounted for about 15+% of the pieces of mail. We had some big files to debug here and there.

And more...

Feel free to add more tools and information here. This answer is community wiki for a reason! We all need more advice on dealing with large amounts of data...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top