Question

Is there a way to include an entire text file as a string in a C program at compile-time?

something like:

  • file.txt:

    This is
    a little
    text file
    
  • main.c:

    #include <stdio.h>
    int main(void) {
       #blackmagicinclude("file.txt", content)
       /*
       equiv: char[] content = "This is\na little\ntext file";
       */
       printf("%s", content);
    }
    

obtaining a little program that prints on stdout "This is a little text file"

At the moment I used an hackish python script, but it's butt-ugly and limited to only one variable name, can you tell me another way to do it?

Was it helpful?

Solution

I'd suggest using (unix util)xxd for this. you can use it like so

$ echo hello world > a
$ xxd -i a

outputs:

unsigned char a[] = {
  0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x0a
};
unsigned int a_len = 12;

OTHER TIPS

The question was about C but in case someone tries to do it with C++11 then it can be done with only little changes to the included text file thanks to the new raw string literals:

In C++ do this:

const char *s =
#include "test.txt"
;

In the text file do this:

R"(Line 1
Line 2
Line 3
Line 4
Line 5
Line 6)"

So there must only be a prefix at the top of the file and a suffix at the end of it. Between it you can do what you want, no special escaping is necessary as long as you don't need the character sequence )". But even this can work if you specify your own custom delimiter:

R"=====(Line 1
Line 2
Line 3
Now you can use "( and )" in the text file, too.
Line 5
Line 6)====="

You have two possibilities:

  1. Make use of compiler/linker extensions to convert a file into a binary file, with proper symbols pointing to the begin and end of the binary data. See this answer: Include binary file with GNU ld linker script.
  2. Convert your file into a sequence of character constants that can initialize an array. Note you can't just do "" and span multiple lines. You would need a line continuation character (\), escape " characters and others to make that work. Easier to just write a little program to convert the bytes into a sequence like '\xFF', '\xAB', ...., '\0' (or use the unix tool xxd described by another answer, if you have it available!):

Code:

#include <stdio.h>

int main() {
    int c;
    while((c = fgetc(stdin)) != EOF) {
        printf("'\\x%X',", (unsigned)c);
    }
    printf("'\\0'"); // put terminating zero
}

(not tested). Then do:

char my_file[] = {
#include "data.h"
};

Where data.h is generated by

cat file.bin | ./bin2c > data.h

ok, inspired by Daemin's post i tested the following simple example :

a.data:

"this is test\n file\n"

test.c:

int main(void)
{
    char *test = 
#include "a.data"
    ;
    return 0;
}

gcc -E test.c output:

# 1 "test.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "test.c"

int main(void)
{
    char *test =
# 1 "a.data" 1
"this is test\n file\n"
# 6 "test.c" 2
    ;
    return 0;
}

So it's working but require data surrounded with quotation marks.

I like kayahr's answer. If you don't want to touch the input files however, and if you are using CMake, you can add the delimeter character sequences on the file. The following CMake code, for instance, copies the input files and wraps their content accordingly:

function(make_includable input_file output_file)
    file(READ ${input_file} content)
    set(delim "for_c++_include")
    set(content "R\"${delim}(\n${content})${delim}\"")
    file(WRITE ${output_file} "${content}")
endfunction(make_includable)

# Use like
make_includable(external/shaders/cool.frag generated/cool.frag)

Then include in c++ like this:

constexpr char *test =
#include "generated/cool.frag"
;

What might work is if you do something like:

int main()
{
    const char* text = "
#include "file.txt"
";
    printf("%s", text);
    return 0;
}

Of course you'll have to be careful with what is actually in the file, making sure there are no double quotes, that all appropriate characters are escaped, etc.

Therefore it might be easier if you just load the text from a file at runtime, or embed the text directly into the code.

If you still wanted the text in another file you could have it in there, but it would have to be represented there as a string. You would use the code as above but without the double quotes in it. For example:

"Something evil\n"\
"this way comes!"

int main()
{
    const char* text =
#include "file.txt"
;
    printf("%s", text);
    return 0;
}

You need my xtr utility but you can do it with a bash script. This is a script I call bin2inc. The first parameter is the name of the resulting char[] variable. The second parameter is the name of the file. The output is C include file with the file content encoded (in lowercase hex) as the variable name given. The char array is zero terminated, and the length of the data is stored in $variableName_length

#!/bin/bash

fileSize ()

{

    [ -e "$1" ]  && {

        set -- `ls -l "$1"`;

        echo $5;

    }

}

echo unsigned char $1'[] = {'
./xtr -fhex -p 0x -s ', ' < "$2";
echo '0x00'
echo '};';
echo '';
echo unsigned long int ${1}_length = $(fileSize "$2")';'

YOU CAN GET XTR HERE xtr (character eXTRapolator) is GPLV3

You can do this using objcopy:

objcopy --input binary --output elf64-x86-64 myfile.txt myfile.o

Now you have an object file you can link into your executable which contains symbols for the beginning, end, and size of the content from myfile.txt.

I reimplemented xxd in python3, fixing all of xxd's annoyances:

  • Const correctness
  • string length datatype: int → size_t
  • Null termination (in case you might want that)
  • C string compatible: Drop unsigned on the array.
  • Smaller, readable output, as you would have written it: Printable ascii is output as-is; other bytes are hex-encoded.

Here is the script, filtered by itself, so you can see what it does:

pyxxd.c

#include <stddef.h>

extern const char pyxxd[];
extern const size_t pyxxd_len;

const char pyxxd[] =
"#!/usr/bin/env python3\n"
"\n"
"import sys\n"
"import re\n"
"\n"
"def is_printable_ascii(byte):\n"
"    return byte >= ord(' ') and byte <= ord('~')\n"
"\n"
"def needs_escaping(byte):\n"
"    return byte == ord('\\\"') or byte == ord('\\\\')\n"
"\n"
"def stringify_nibble(nibble):\n"
"    if nibble < 10:\n"
"        return chr(nibble + ord('0'))\n"
"    return chr(nibble - 10 + ord('a'))\n"
"\n"
"def write_byte(of, byte):\n"
"    if is_printable_ascii(byte):\n"
"        if needs_escaping(byte):\n"
"            of.write('\\\\')\n"
"        of.write(chr(byte))\n"
"    elif byte == ord('\\n'):\n"
"        of.write('\\\\n\"\\n\"')\n"
"    else:\n"
"        of.write('\\\\x')\n"
"        of.write(stringify_nibble(byte >> 4))\n"
"        of.write(stringify_nibble(byte & 0xf))\n"
"\n"
"def mk_valid_identifier(s):\n"
"    s = re.sub('^[^_a-z]', '_', s)\n"
"    s = re.sub('[^_a-z0-9]', '_', s)\n"
"    return s\n"
"\n"
"def main():\n"
"    # `xxd -i` compatibility\n"
"    if len(sys.argv) != 4 or sys.argv[1] != \"-i\":\n"
"        print(\"Usage: xxd -i infile outfile\")\n"
"        exit(2)\n"
"\n"
"    with open(sys.argv[2], \"rb\") as infile:\n"
"        with open(sys.argv[3], \"w\") as outfile:\n"
"\n"
"            identifier = mk_valid_identifier(sys.argv[2]);\n"
"            outfile.write('#include <stddef.h>\\n\\n');\n"
"            outfile.write('extern const char {}[];\\n'.format(identifier));\n"
"            outfile.write('extern const size_t {}_len;\\n\\n'.format(identifier));\n"
"            outfile.write('const char {}[] =\\n\"'.format(identifier));\n"
"\n"
"            while True:\n"
"                byte = infile.read(1)\n"
"                if byte == b\"\":\n"
"                    break\n"
"                write_byte(outfile, ord(byte))\n"
"\n"
"            outfile.write('\";\\n\\n');\n"
"            outfile.write('const size_t {}_len = sizeof({}) - 1;\\n'.format(identifier, identifier));\n"
"\n"
"if __name__ == '__main__':\n"
"    main()\n"
"";

const size_t pyxxd_len = sizeof(pyxxd) - 1;

Usage (this extracts the script):

#include <stdio.h>

extern const char pyxxd[];
extern const size_t pyxxd_len;

int main()
{
    fwrite(pyxxd, 1, pyxxd_len, stdout);
}

Even if it can be done at compile time (I don't think it can in general), the text would likely be the preprocessed header rather than the files contents verbatim. I expect you'll have to load the text from the file at runtime or do a nasty cut-n-paste job.

in x.h

"this is a "
"buncha text"

in main.c

#include <stdio.h>
int main(void)
{
    char *textFileContents =
#include "x.h"
    ;

    printf("%s\n", textFileContents);

    return 0
}

ought to do the job.

Hasturkun's answer using the xxd -i option is excellent. If you want to incorporate the conversion process (text -> hex include file) directly into your build the hexdump.c tool/library recently added a capability similar to xxd's -i option (it doesn't give you the full header - you need to provide the char array definition - but that has the advantage of letting you pick the name of the char array):

http://25thandclement.com/~william/projects/hexdump.c.html

It's license is a lot more "standard" than xxd and is very liberal - an example of using it to embed an init file in a program can be seen in the CMakeLists.txt and scheme.c files here:

https://github.com/starseeker/tinyscheme-cmake

There are pros and cons both to including generated files in source trees and bundling utilities - how to handle it will depend on the specific goals and needs of your project. hexdump.c opens up the bundling option for this application.

I think it is not possible with the compiler and preprocessor alone. gcc allows this:

#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)

    printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
            STRGF(
#               define hostname my_dear_hostname
                hostname
            )
            "\n" );

But unfortunately not this:

#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)

    printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
            STRGF(
#               include "/etc/hostname"
            )
            "\n" );

The error is:

/etc/hostname: In function ‘init_module’:
/etc/hostname:1:0: error: unterminated argument list invoking macro "STRGF"

Why not link the text into the program and use it as a global variable! Here is an example. I'm considering using this to include Open GL shader files within an executable since GL shaders need to be compiled for the GPU at runtime.

I had similar issues, and for small files the aforementioned solution of Johannes Schaub worked like a charm for me.

However, for files that are a bit larger, it ran into issues with the character array limit of the compiler. Therefore, I wrote a small encoder application that converts file content into a 2D character array of equally sized chunks (and possibly padding zeros). It produces output textfiles with 2D array data like this:

const char main_js_file_data[8][4]= {
    {'\x69','\x73','\x20','\0'},
    {'\x69','\x73','\x20','\0'},
    {'\x61','\x20','\x74','\0'},
    {'\x65','\x73','\x74','\0'},
    {'\x20','\x66','\x6f','\0'},
    {'\x72','\x20','\x79','\0'},
    {'\x6f','\x75','\xd','\0'},
    {'\xa','\0','\0','\0'}};

where 4 is actually a variable MAX_CHARS_PER_ARRAY in the encoder. The file with the resulting C code, called, for example "main_js_file_data.h" can then easily be inlined into the C++ application, for example like this:

#include "main_js_file_data.h"

Here is the source code of the encoder:

#include <fstream>
#include <iterator>
#include <vector>
#include <algorithm>


#define MAX_CHARS_PER_ARRAY 2048


int main(int argc, char * argv[])
{
    // three parameters: input filename, output filename, variable name
    if (argc < 4)
    {
        return 1;
    }

    // buffer data, packaged into chunks
    std::vector<char> bufferedData;

    // open input file, in binary mode
    {    
        std::ifstream fStr(argv[1], std::ios::binary);
        if (!fStr.is_open())
        {
            return 1;
        }

        bufferedData.assign(std::istreambuf_iterator<char>(fStr), 
                            std::istreambuf_iterator<char>()     );
    }

    // write output text file, containing a variable declaration,
    // which will be a fixed-size two-dimensional plain array
    {
        std::ofstream fStr(argv[2]);
        if (!fStr.is_open())
        {
            return 1;
        }
        const std::size_t numChunks = std::size_t(std::ceil(double(bufferedData.size()) / (MAX_CHARS_PER_ARRAY - 1)));
        fStr << "const char " << argv[3] << "[" << numChunks           << "]"    <<
                                            "[" << MAX_CHARS_PER_ARRAY << "]= {" << std::endl;
        std::size_t count = 0;
        fStr << std::hex;
        while (count < bufferedData.size())
        {
            std::size_t n = 0;
            fStr << "{";
            for (; n < MAX_CHARS_PER_ARRAY - 1 && count < bufferedData.size(); ++n)
            {
                fStr << "'\\x" << int(unsigned char(bufferedData[count++])) << "',";
            }
            // fill missing part to reach fixed chunk size with zero entries
            for (std::size_t j = 0; j < (MAX_CHARS_PER_ARRAY - 1) - n; ++j)
            {
                fStr << "'\\0',";
            }
            fStr << "'\\0'}";
            if (count < bufferedData.size())
            {
                fStr << ",\n";
            }
        }
        fStr << "};\n";
    }

    return 0;
}

If you're willing to resort to some dirty tricks you can get creative with raw string literals and #include for certain types of files.

For example, say I want to include some SQL scripts for SQLite in my project and I want to get syntax highlighting but don't want any special build infrastructure. I can have this file test.sql which is valid SQL for SQLite where -- starts a comment:

--x, R"(--
SELECT * from TestTable
WHERE field = 5
--)"

And then in my C++ code I can have:

int main()
{
    auto x = 0;
    const char* mysql = (
#include "test.sql"
    );

    cout << mysql << endl;
}

The output is:

--
SELECT * from TestTable
WHERE field = 5
--

Or to include some Python code from a file test.py which is a valid Python script (because # starts a comment in Python and pass is a no-op):

#define pass R"(
pass
def myfunc():
    print("Some Python code")

myfunc()
#undef pass
#define pass )"
pass

And then in the C++ code:

int main()
{
    const char* mypython = (
#include "test.py"
    );

    cout << mypython << endl;
}

Which will output:

pass
def myfunc():
    print("Some Python code")

myfunc()
#undef pass
#define pass

It should be possible to play similar tricks for various other types of code you might want to include as a string. Whether or not it is a good idea I'm not sure. It's kind of a neat hack but probably not something you'd want in real production code. Might be ok for a weekend hack project though.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top