Question

I'm currently developing an application that happens to require some file preprocessing before actually reading the data.

Doing it externally was not a possibility so I came up with a fork & execve of "cut options filename | sort | uniq -c" etc... and I execute it like that.

However I thought that maybe there was already another option to reuse all those ancient and good working tools directly in my code and not having to invoke them through a shell.

I am currently looking at busybox to see if there is an easy way of statically link and programatically call those utils but no luck yet.

Was it helpful?

Solution

Arkaitz, the answer no, because of how you've phrased the question.

You ask for "another option to reuse all those ancient and good working tools directly in my code and not having to invoke them through a shell"

The problem with that is, the proper and accepted way of reusing all those ancient and good working tools is exactly what you're saying you want to avoid - invoking them via a shell (or at least, firing them up as child processes via popen for example) - and it's definitely not recommend to try to subsume, copy, or duplicate these tools into your code.

The UNIX (and Linux) model for data manipulation is robust and proven - why would you want to avoid it?

OTHER TIPS

The 'ancient' tools were built for use by the shell, not to be built/linked into an executable. There are, however, more recent tools that kinda do lot of what you showed on your command line preprocessor: iostreams with extractors (to replace cut), std::sort and std::unique to replace the respective programs...

struct S { string col1, col3; 
   bool operator<( const S& s ) { return col1 < s.col1; }
};
vector<S> v;
while( cin ) {
  S s;
  string dummy;
  cin >> s.col1 >> dummy >> col3 >> dummy;
  v.push_back( s );
}
sort(v.begin(), v.end(), S::smaller );
unique( v.begin(), v.end() );

Not too complicated, I think.

Try popen().

char buffer [ BUFFER_SIZE ];
FILE * f = popen( "cut options filename | sort | uniq -c", "r" );
while( /*NOT*/! feof(f) )
  fgets( buffer, BUFFER_SIZE, f );
pclose( f );

Reference: How to execute a command and get output of command within C++ using POSIX?

You have to do it through the shell, but it's easier to use "system" call.

while(something) {
           int ret = system("foo");

           if (WIFSIGNALED(ret) &&
               (WTERMSIG(ret) == SIGINT || WTERMSIG(ret) == SIGQUIT))
                   break;
       }

Just write another useful 'ancient and good' tool ;) and read all data from stdin and return it to stdout.

cat *.txt | grep 'blabla' | sort | my_new_tool | tee -o res_file

The nice way to do it is:

  • Create 2 pipes
  • Fork a new process
  • Replace stdin and stdout for child process with pipes using dup2 function
  • exec a command you'd like
  • Write and read from parent process using pipes

busybox was my first thought as well, although you might also want to consider embedding a scripting engine like Python and doing these kind of manipulations in Python scripts.

I would definitely not try to strip this kind of functionality out of GNU command line tools since they have grown significantly since the early UNIX days and sprouted an awful lot of options.

If the busybox code seems too hard to adapt, then the next place I would look would be Minix source code. Look under Previous Versions and pick one of the version 1 or 2 Minixes because those were written as teaching code so they tend to be clearer and simpler.

If you do not want to call external commands (whether by exec, popen or system etc) but do not want to modify the source of these utilities and compile them into your code (relatively easy, just change 'main' to 'main_cut' etc), then the only remaining option I see is to embed the utilities inside your code and either extract them at runtime or dynamically create a filing system by pointing at the data inside your code (eg using a floppy or cd image and writing a FUSE module that picks up the disk image data from a ram address). All of which seems like a lot of work just to make this look like a single neatly-packaged utility.

Personally, if i really had to do this, I'd get the source of all those utils and compile them in as external calls. Of course you'd no longer have pipes easily available, you'd either have to use temp files for preprocessing, or something more complicated involving co-routines. Or maybe sockets. Lots of work and messy whatever you do!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top