Come faccio a dividere una stringa in C++?

https://stackoverflow.com/questions/53849

09-06-2019
|

Domanda

Java è un comodo metodo split:

String str = "The quick brown fox";
String[] results = str.split(" ");

C'è un modo semplice per fare questo in C++?

Soluzione

Il caso più semplice può essere facilmente realizzato utilizzando il std::string::find metodo.Tuttavia, date un'occhiata a Boost.Il Tokenizer.E ' grande.Boost generalmente molto cool stringa di strumenti.

Altri suggerimenti

Il Aumentare il tokenizer classe è in grado di fare questo genere di cose molto semplici:

#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int, char**)
{
    string text = "token, test   string";

    char_separator<char> sep(", ");
    tokenizer< char_separator<char> > tokens(text, sep);
    BOOST_FOREACH (const string& t, tokens) {
        cout << t << "." << endl;
    }
}

Aggiornato per C++11:

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int, char**)
{
    string text = "token, test   string";

    char_separator<char> sep(", ");
    tokenizer<char_separator<char>> tokens(text, sep);
    for (const auto& t : tokens) {
        cout << t << "." << endl;
    }
}

Ecco un vero e proprio semplice:

#include <vector>
#include <string>
using namespace std;

vector<string> split(const char *str, char c = ' ')
{
    vector<string> result;

    do
    {
        const char *begin = str;

        while(*str != c && *str)
            str++;

        result.push_back(string(begin, str));
    } while (0 != *str++);

    return result;
}

Utilizzare strtok.A mio parere, non c'è bisogno di creare una classe circa la creazione di token a meno che non strtok non fornisce ciò di cui avete bisogno.Non potrebbe, ma in più di 15 anni di scrittura diversi per l'analisi del codice in C e C++, ho sempre usato strtok.Qui è un esempio

char myString[] = "The quick brown fox";
char *p = strtok(myString, " ");
while (p) {
    printf ("Token: %s\n", p);
    p = strtok(NULL, " ");
}

Alcune avvertenze (che potrebbe non soddisfare le vostre esigenze).La stringa è "distrutto" nel processo, il che significa che EOS personaggi sono collocati in delimter punti.Utilizzo corretto, potrebbe essere necessario fare un non-const versione della stringa.È inoltre possibile modificare l'elenco dei delimitatori metà analizzare.

A mio parere, il suddetto codice è molto più semplice e più facile da usare rispetto alla scrittura di una classe separata per esso.Per me, questa è una di quelle funzioni che il linguaggio fornisce e lo fa bene e in modo pulito.È semplicemente una "C" sulla base della soluzione.È opportuno, è facile, e non è necessario scrivere un sacco di codice :-)

Un altro modo veloce è quello di utilizzare getline.Qualcosa di simile a:

stringstream ss("bla bla");
string s;

while (getline(ss, s, ' ')) {
 cout << s << endl;
}

Se si desidera, si può fare un semplice split() metodo di restituzione di un vector<string>, che è davvero utile.

È possibile utilizzare i flussi, iteratori, e la copia algoritmo per fare questo abbastanza direttamente.

#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>

int main()
{
  std::string str = "The quick brown fox";

  // construct a stream from the string
  std::stringstream strstr(str);

  // use stream iterators to copy the stream to the vector as whitespace separated strings
  std::istream_iterator<std::string> it(strstr);
  std::istream_iterator<std::string> end;
  std::vector<std::string> results(it, end);

  // send the vector to stdout.
  std::ostream_iterator<std::string> oit(std::cout);
  std::copy(results.begin(), results.end(), oit);
}

Senza offesa ragazzi, ma per un semplice problema, si stanno facendo le cose modo troppo complicato.Ci sono un sacco di motivi per utilizzare Boost.Ma per qualcosa di così semplice, è come colpire una mosca con un 20# slitta.

void
split( vector<string> & theStringVector,  /* Altered/returned value */
       const  string  & theString,
       const  string  & theDelimiter)
{
    UASSERT( theDelimiter.size(), >, 0); // My own ASSERT macro.

    size_t  start = 0, end = 0;

    while ( end != string::npos)
    {
        end = theString.find( theDelimiter, start);

        // If at end, use length=maxLength.  Else use length=end-start.
        theStringVector.push_back( theString.substr( start,
                       (end == string::npos) ? string::npos : end - start));

        // If at end, use start=maxSize.  Else use start=end+delimiter.
        start = (   ( end > (string::npos - theDelimiter.size()) )
                  ?  string::npos  :  end + theDelimiter.size());
    }
}

Per esempio (per Doug caso),

#define SHOW(I,X)   cout << "[" << (I) << "]\t " # X " = \"" << (X) << "\"" << endl

int
main()
{
    vector<string> v;

    split( v, "A:PEP:909:Inventory Item", ":" );

    for (unsigned int i = 0;  i < v.size();   i++)
        SHOW( i, v[i] );
}

E sì, si potrebbe avere split() restituiscono un vettore piuttosto che passare una.È banale per avvolgere e sovraccarico.Ma a seconda di cosa sto facendo, mi capita spesso di trovare di meglio, di ri-uso preesistenti oggetti, piuttosto che creare sempre nuovi.(A me basta non dimenticare di svuotare il vettore in mezzo!)

Riferimento: http://www.cplusplus.com/reference/string/string/.

(Mi è stato originariamente scrivere una risposta a Doug domanda: Le Stringhe C++ la Modifica e l'Estrazione basata su Separatori (chiuso).Ma da quando Martin York chiuso la questione con un puntatore qui...Mi limiterò a generalizzare il mio codice.)

Boost ha una forte funzione di split: boost::algoritmo::split.

Programma di esempio:

#include <vector>
#include <boost/algorithm/string.hpp>

int main() {
    auto s = "a,b, c ,,e,f,";
    std::vector<std::string> fields;
    boost::split(fields, s, boost::is_any_of(","));
    for (const auto& field : fields)
        std::cout << "\"" << field << "\"\n";
    return 0;
}

Output:

"a"
"b"
" c "
""
"e"
"f"
""

Una soluzione che utilizza regex_token_iterators:

#include <iostream>
#include <regex>
#include <string>

using namespace std;

int main()
{
    string str("The quick brown fox");

    regex reg("\\s+");

    sregex_token_iterator iter(str.begin(), str.end(), reg, -1);
    sregex_token_iterator end;

    vector<string> vec(iter, end);

    for (auto a : vec)
    {
        cout << a << endl;
    }
}

So di aver chiesto un C++ soluzione, ma si potrebbe prendere in considerazione questo:

#include <QString>

...

QString str = "The quick brown fox"; 
QStringList results = str.split(" ");

Il vantaggio rispetto ad Aumentare, in questo esempio è che è una diretta mappatura uno a uno per il tuo post di codice.

Vedi di più Qt documentazione

Qui è un esempio di classe tokenizer che potrebbe fare quello che vuoi

//Header file
class Tokenizer 
{
    public:
        static const std::string DELIMITERS;
        Tokenizer(const std::string& str);
        Tokenizer(const std::string& str, const std::string& delimiters);
        bool NextToken();
        bool NextToken(const std::string& delimiters);
        const std::string GetToken() const;
        void Reset();
    protected:
        size_t m_offset;
        const std::string m_string;
        std::string m_token;
        std::string m_delimiters;
};

//CPP file
const std::string Tokenizer::DELIMITERS(" \t\n\r");

Tokenizer::Tokenizer(const std::string& s) :
    m_string(s), 
    m_offset(0), 
    m_delimiters(DELIMITERS) {}

Tokenizer::Tokenizer(const std::string& s, const std::string& delimiters) :
    m_string(s), 
    m_offset(0), 
    m_delimiters(delimiters) {}

bool Tokenizer::NextToken() 
{
    return NextToken(m_delimiters);
}

bool Tokenizer::NextToken(const std::string& delimiters) 
{
    size_t i = m_string.find_first_not_of(delimiters, m_offset);
    if (std::string::npos == i) 
    {
        m_offset = m_string.length();
        return false;
    }

    size_t j = m_string.find_first_of(delimiters, i);
    if (std::string::npos == j) 
    {
        m_token = m_string.substr(i);
        m_offset = m_string.length();
        return true;
    }

    m_token = m_string.substr(i, j - i);
    m_offset = j;
    return true;
}

Esempio:

std::vector <std::string> v;
Tokenizer s("split this string", " ");
while (s.NextToken())
{
    v.push_back(s.GetToken());
}

Questo è un semplice STL-unica soluzione (~5 righe!) utilizzando std::find e std::find_first_not_of che gestisce le ripetizioni del delimitatore (come gli spazi o i periodi, per esempio), così iniziali e finali di delimitatori:

#include <string>
#include <vector>

void tokenize(std::string str, std::vector<string> &token_v){
    size_t start = str.find_first_not_of(DELIMITER), end=start;

    while (start != std::string::npos){
        // Find next occurence of delimiter
        end = str.find(DELIMITER, start);
        // Push back the token found into vector
        token_v.push_back(str.substr(start, end-start));
        // Skip all occurences of the delimiter to find new start
        start = str.find_first_not_of(DELIMITER, end);
    }
}

Provare live!

pystring è una piccola libreria che implementa un sacco di Python stringa di funzioni, tra cui il metodo split:

#include <string>
#include <vector>
#include "pystring.h"

std::vector<std::string> chunks;
pystring::split("this string", chunks);

// also can specify a separator
pystring::split("this-string", chunks, "-");

Ho postato questa risposta per una domanda simile.
Non reinventare la ruota.Ho usato un certo numero di librerie e il più veloce e più flessibile che ho incontrato è: C++ Stringa Toolkit Biblioteca.

Qui è un esempio di come usarlo che ho postato altrove su stackoverflow.

#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>

const char *whitespace  = " \t\r\n\f";
const char *whitespace_and_punctuation  = " \t\r\n\f;,=";

int main()
{
    {   // normal parsing of a string into a vector of strings
       std::string s("Somewhere down the road");
       std::vector<std::string> result;
       if( strtk::parse( s, whitespace, result ) )
       {
           for(size_t i = 0; i < result.size(); ++i )
            std::cout << result[i] << std::endl;
       }
    }

    {  // parsing a string into a vector of floats with other separators
       // besides spaces

       std::string t("3.0, 3.14; 4.0");
       std::vector<float> values;
       if( strtk::parse( s, whitespace_and_punctuation, values ) )
       {
           for(size_t i = 0; i < values.size(); ++i )
            std::cout << values[i] << std::endl;
       }
    }

    {  // parsing a string into specific variables

       std::string u("angle = 45; radius = 9.9");
       std::string w1, w2;
       float v1, v2;
       if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
       {
           std::cout << "word " << w1 << ", value " << v1 << std::endl;
           std::cout << "word " << w2 << ", value " << v2 << std::endl;
       }
    }

    return 0;
}

Controllare questo esempio.Potrebbe aiutare..

#include <iostream>
#include <sstream>

using namespace std;

int main ()
{
    string tmps;
    istringstream is ("the dellimiter is the space");
    while (is.good ()) {
        is >> tmps;
        cout << tmps << "\n";
    }
    return 0;
}

MFC/ATL ha un bel tokenizer.Da MSDN:

CAtlString str( "%First Second#Third" );
CAtlString resToken;
int curPos= 0;

resToken= str.Tokenize("% #",curPos);
while (resToken != "")
{
   printf("Resulting token: %s\n", resToken);
   resToken= str.Tokenize("% #",curPos);
};

Output

Resulting Token: First
Resulting Token: Second
Resulting Token: Third

Si può semplicemente utilizzare un raccolta di espressioni regolari e risolvere utilizzando le espressioni regolari.

Utilizzare l'espressione (\w+) e la variabile \1 (o $1 a seconda che la biblioteca di implementazione delle espressioni regolari).

Se siete disposti a utilizzare C, è possibile utilizzare il strtok funzione.Si dovrebbe prestare attenzione al multi-threading problemi quando lo si utilizza.

Per cose semplici sono le seguenti:

unsigned TokenizeString(const std::string& i_source,
                        const std::string& i_seperators,
                        bool i_discard_empty_tokens,
                        std::vector<std::string>& o_tokens)
{
    unsigned prev_pos = 0;
    unsigned pos = 0;
    unsigned number_of_tokens = 0;
    o_tokens.clear();
    pos = i_source.find_first_of(i_seperators, pos);
    while (pos != std::string::npos)
    {
        std::string token = i_source.substr(prev_pos, pos - prev_pos);
        if (!i_discard_empty_tokens || token != "")
        {
            o_tokens.push_back(i_source.substr(prev_pos, pos - prev_pos));
            number_of_tokens++;
        }

        pos++;
        prev_pos = pos;
        pos = i_source.find_first_of(i_seperators, pos);
    }

    if (prev_pos < i_source.length())
    {
        o_tokens.push_back(i_source.substr(prev_pos));
        number_of_tokens++;
    }

    return number_of_tokens;
}

Vile disclaimer:Scrivo in tempo reale software di elaborazione dati in cui i dati arriva attraverso i file binari, prese di corrente, o qualche chiamata di API (schede I/O, fotocamera).Io non uso mai questa funzione di qualcosa di più complicato o un momento critico di lettura esterna file di configurazione di avvio.

Molti di troppo complicato suggerimenti qui.Prova questo semplice std::string soluzione:

using namespace std;

string someText = ...

string::size_type tokenOff = 0, sepOff = tokenOff;
while (sepOff != string::npos)
{
    sepOff = someText.find(' ', sepOff);
    string::size_type tokenLen = (sepOff == string::npos) ? sepOff : sepOff++ - tokenOff;
    string token = someText.substr(tokenOff, tokenLen);
    if (!token.empty())
        /* do something with token */;
    tokenOff = sepOff;
}

Ho pensato che quello che è stato il >> operatore su stringa flussi di stato per:

string word; sin >> word;

Adam Pierce risposta fornisce un lavorati a mano tokenizer di prendere in const char*.È un po ' più problematico a che fare con gli iteratori perché l'incremento di un string's fine iteratore è definito.Detto questo, dato string str{ "The quick brown fox" } si può certamente ottenere questo risultato:

auto start = find(cbegin(str), cend(str), ' ');
vector<string> tokens{ string(cbegin(str), start) };

while (start != cend(str)) {
    const auto finish = find(++start, cend(str), ' ');

    tokens.push_back(string(start, finish));
    start = finish;
}

L'Esempio Vivo

Se stai cercando di astratto complessità utilizzando le funzionalità standard, come Su Freund suggerisce strtok è una semplice opzione:

vector<string> tokens;

for (auto i = strtok(data(str), " "); i != nullptr; i = strtok(nullptr, " ")) tokens.push_back(i);

Se non si dispone di accesso a C++17 avrete bisogno di sostituire il data(str) come in questo esempio: http://ideone.com/8kAGoa

Anche se non dimostrato nell'esempio, strtok non è necessario utilizzare lo stesso delimitatore per ogni token.Con questo vantaggio, però, ci sono alcuni inconvenienti:

strtok non può essere utilizzato su più strings allo stesso tempo:Un nullptr deve essere passato per continuare la creazione di token corrente string o un nuovo char* per simboleggiare deve essere superato (non-standard implementazioni che supportano questo, tuttavia, come: strtok_s)
Per lo stesso motivo strtok non possono essere utilizzati su più thread contemporaneamente (questo però potrebbe essere l'attuazione definiti, per esempio: Visual Studio implementazione è thread-safe)
Chiamata strtok modifica il string operativo, quindi non può essere utilizzato su const strings, const char*s, o le stringhe letterali, a simboleggiare una qualsiasi di queste con strtok o operare su di un string che i contenuti devono essere conservati, str dovrebbe essere la copia, la copia potrebbe essere operato

Entrambi i metodi precedenti non possono generare un token vector sul posto, ovvero senza l'astrazione in una funzione di supporto che non è possibile inizializzare const vector<string> tokens.Tale funzionalità e la capacità di accettare qualsiasi white-space delimitatore può essere sfruttata utilizzando un istream_iterator.Per esempio: const string str{ "The quick \tbrown \nfox" } siamo in grado di fare questo:

istringstream is{ str };
const vector<string> tokens{ istream_iterator<string>(is), istream_iterator<string>() };

L'Esempio Vivo

La richiesta di costruzione di un istringstream per questa opzione è di gran lunga maggiore costo rispetto al precedente 2 opzioni, tuttavia, questo costo è in genere nascosti nella spesa di string allocazione.

Se nessuna delle opzioni di cui sopra sono flexable sufficiente per il tuo tokenizzazione esigenze, l'opzione più flessibile utilizza un regex_token_iterator naturalmente con questa flessibilità ha una maggiore spesa, ma ancora una volta questo è probabilmente nascosto nel string l'allocazione dei costi.Dire per esempio vogliamo simboleggiare non preceduti da una virgola, anche mangiando bianco-spazio, dato i seguenti input: const string str{ "The ,qu\\,ick ,\tbrown, fox" } siamo in grado di fare questo:

const regex re{ "\\s*((?:[^\\\\,]|\\\\.)*?)\\s*(?:,|$)" };
const vector<string> tokens{ sregex_token_iterator(cbegin(str), cend(str), re, 1), sregex_token_iterator() };

L'Esempio Vivo

Ecco un approccio che permette di controllare se l'vuoto gettoni inclusi (come strsep) o escludere (come strtok).

#include <string.h> // for strchr and strlen

/*
 * want_empty_tokens==true  : include empty tokens, like strsep()
 * want_empty_tokens==false : exclude empty tokens, like strtok()
 */
std::vector<std::string> tokenize(const char* src,
                                  char delim,
                                  bool want_empty_tokens)
{
  std::vector<std::string> tokens;

  if (src and *src != '\0') // defensive
    while( true )  {
      const char* d = strchr(src, delim);
      size_t len = (d)? d-src : strlen(src);

      if (len or want_empty_tokens)
        tokens.push_back( std::string(src, len) ); // capture token

      if (d) src += len+1; else break;
    }

  return tokens;
}

Mi sembra strano che con tutti noi di velocità consapevole nerd qui in MODO che nessuno ha presentato una versione che utilizza un tempo di compilazione generato look up table per il delimitatore (esempio di implementazione più in basso).Utilizzando una look up table e iteratori deve battere std::regex in termini di efficienza, se non hai bisogno di battere regex, e basta, i suoi standard di C++11 e super flessibile.

Alcuni hanno suggerito regex già, ma per i niubbi qui è il pacchetto di un esempio che dovrebbe fare esattamente ciò che l'OP si aspetta:

std::vector<std::string> split(std::string::const_iterator it, std::string::const_iterator end, std::regex e = std::regex{"\\w+"}){
    std::smatch m{};
    std::vector<std::string> ret{};
    while (std::regex_search (it,end,m,e)) {
        ret.emplace_back(m.str());              
        std::advance(it, m.position() + m.length()); //next start position = match position + match length
    }
    return ret;
}
std::vector<std::string> split(const std::string &s, std::regex e = std::regex{"\\w+"}){  //comfort version calls flexible version
    return split(s.cbegin(), s.cend(), std::move(e));
}
int main ()
{
    std::string str {"Some people, excluding those present, have been compile time constants - since puberty."};
    auto v = split(str);
    for(const auto&s:v){
        std::cout << s << std::endl;
    }
    std::cout << "crazy version:" << std::endl;
    v = split(str, std::regex{"[^e]+"});  //using e as delim shows flexibility
    for(const auto&s:v){
        std::cout << s << std::endl;
    }
    return 0;
}

Se abbiamo bisogno di essere più veloce e accettare il vincolo che tutti i caratteri deve essere di 8 bit possiamo fare una tabella in fase di compilazione utilizzando metaprogrammazione:

template<bool...> struct BoolSequence{};        //just here to hold bools
template<char...> struct CharSequence{};        //just here to hold chars
template<typename T, char C> struct Contains;   //generic
template<char First, char... Cs, char Match>    //not first specialization
struct Contains<CharSequence<First, Cs...>,Match> :
    Contains<CharSequence<Cs...>, Match>{};     //strip first and increase index
template<char First, char... Cs>                //is first specialization
struct Contains<CharSequence<First, Cs...>,First>: std::true_type {}; 
template<char Match>                            //not found specialization
struct Contains<CharSequence<>,Match>: std::false_type{};

template<int I, typename T, typename U> 
struct MakeSequence;                            //generic
template<int I, bool... Bs, typename U> 
struct MakeSequence<I,BoolSequence<Bs...>, U>:  //not last
    MakeSequence<I-1, BoolSequence<Contains<U,I-1>::value,Bs...>, U>{};
template<bool... Bs, typename U> 
struct MakeSequence<0,BoolSequence<Bs...>,U>{   //last  
    using Type = BoolSequence<Bs...>;
};
template<typename T> struct BoolASCIITable;
template<bool... Bs> struct BoolASCIITable<BoolSequence<Bs...>>{
    /* could be made constexpr but not yet supported by MSVC */
    static bool isDelim(const char c){
        static const bool table[256] = {Bs...};
        return table[static_cast<int>(c)];
    }   
};
using Delims = CharSequence<'.',',',' ',':','\n'>;  //list your custom delimiters here
using Table = BoolASCIITable<typename MakeSequence<256,BoolSequence<>,Delims>::Type>;

Con che al posto di fare un getNextToken la funzione è semplice:

template<typename T_It>
std::pair<T_It,T_It> getNextToken(T_It begin,T_It end){
    begin = std::find_if(begin,end,std::not1(Table{})); //find first non delim or end
    auto second = std::find_if(begin,end,Table{});      //find first delim or end
    return std::make_pair(begin,second);
}

Usarlo è semplice:

int main() {
    std::string s{"Some people, excluding those present, have been compile time constants - since puberty."};
    auto it = std::begin(s);
    auto end = std::end(s);
    while(it != std::end(s)){
        auto token = getNextToken(it,end);
        std::cout << std::string(token.first,token.second) << std::endl;
        it = token.second;
    }
    return 0;
}

Qui è un esempio: http://ideone.com/GKtkLQ

So che questa domanda è già una risposta, ma voglio contribuire.Forse la mia soluzione è un po ' semplice, ma questo è quello che ho pensato è:

vector<string> get_words(string const& text)
{
    vector<string> result;
    string tmp = text;

    size_t first_pos = 0;
    size_t second_pos = tmp.find(" ");;

    while (second_pos != string::npos)
    {
        if (first_pos != second_pos)
        {
            string word = tmp.substr(first_pos, second_pos - first_pos);
            result.push_back(word);
        }
        tmp = tmp.substr(second_pos + 1);
        second_pos = tmp.find(" ");
    }

    result.push_back(tmp);

    return result;
}

Si prega di commento, se c'è un migliore approccio a qualcosa nel mio codice o se c'è qualcosa di sbagliato.

Non vi è alcun modo diretto per farlo.Riferimento questo codice progetto codice sorgente per scoprire come creare una classe per questo.

si può prendere vantaggio di boost::make_find_iterator.Qualcosa di simile a questo:

template<typename CH>
inline vector< basic_string<CH> > tokenize(
    const basic_string<CH> &Input,
    const basic_string<CH> &Delimiter,
    bool remove_empty_token
    ) {

    typedef typename basic_string<CH>::const_iterator string_iterator_t;
    typedef boost::find_iterator< string_iterator_t > string_find_iterator_t;

    vector< basic_string<CH> > Result;
    string_iterator_t it = Input.begin();
    string_iterator_t it_end = Input.end();
    for(string_find_iterator_t i = boost::make_find_iterator(Input, boost::first_finder(Delimiter, boost::is_equal()));
        i != string_find_iterator_t();
        ++i) {
        if(remove_empty_token){
            if(it != i->begin())
                Result.push_back(basic_string<CH>(it,i->begin()));
        }
        else
            Result.push_back(basic_string<CH>(it,i->begin()));
        it = i->end();
    }
    if(it != it_end)
        Result.push_back(basic_string<CH>(it,it_end));

    return Result;
}

Se la lunghezza massima della stringa di input per essere token è noto, si può sfruttare questa e implementare un molto veloce versione.Sto abbozzando l'idea di base di seguito, che è stato ispirato da entrambi strtok() e il "suffix array"-struttura di dati descritto Jon Bentley "Programmazione Perls" 2 ° edizione, capitolo 15.La classe C++ in questo caso dà solo un po ' di organizzazione e comodità di utilizzo.L'attuazione dimostrato di poter essere facilmente esteso per rimuovere gli spazi iniziali e finali caratteri di un token.

Fondamentalmente si può sostituire il separatore di caratteri stringa di terminazione '\0'-caratteri e set di puntatori ai token entro la stringa modificata.Nel caso estremo, quando la stringa è costituita solo di separatori, ottiene una stringa di lunghezza e 1 vuoto gettoni.È pratico per duplicare la stringa da modificare.

File di intestazione:

class TextLineSplitter
{
public:

    TextLineSplitter( const size_t max_line_len );

    ~TextLineSplitter();

    void            SplitLine( const char *line,
                               const char sep_char = ',',
                             );

    inline size_t   NumTokens( void ) const
    {
        return mNumTokens;
    }

    const char *    GetToken( const size_t token_idx ) const
    {
        assert( token_idx < mNumTokens );
        return mTokens[ token_idx ];
    }

private:
    const size_t    mStorageSize;

    char           *mBuff;
    char          **mTokens;
    size_t          mNumTokens;

    inline void     ResetContent( void )
    {
        memset( mBuff, 0, mStorageSize );
        // mark all items as empty:
        memset( mTokens, 0, mStorageSize * sizeof( char* ) );
        // reset counter for found items:
        mNumTokens = 0L;
    }
};

Implementattion file:

TextLineSplitter::TextLineSplitter( const size_t max_line_len ):
    mStorageSize ( max_line_len + 1L )
{
    // allocate memory
    mBuff   = new char  [ mStorageSize ];
    mTokens = new char* [ mStorageSize ];

    ResetContent();
}

TextLineSplitter::~TextLineSplitter()
{
    delete [] mBuff;
    delete [] mTokens;
}


void TextLineSplitter::SplitLine( const char *line,
                                  const char sep_char   /* = ',' */,
                                )
{
    assert( sep_char != '\0' );

    ResetContent();
    strncpy( mBuff, line, mMaxLineLen );

    size_t idx       = 0L; // running index for characters

    do
    {
        assert( idx < mStorageSize );

        const char chr = line[ idx ]; // retrieve current character

        if( mTokens[ mNumTokens ] == NULL )
        {
            mTokens[ mNumTokens ] = &mBuff[ idx ];
        } // if

        if( chr == sep_char || chr == '\0' )
        { // item or line finished
            // overwrite separator with a 0-terminating character:
            mBuff[ idx ] = '\0';
            // count-up items:
            mNumTokens ++;
        } // if

    } while( line[ idx++ ] );
}

Uno scenario di utilizzo sarebbe:

// create an instance capable of splitting strings up to 1000 chars long:
TextLineSplitter spl( 1000 );
spl.SplitLine( "Item1,,Item2,Item3" );
for( size_t i = 0; i < spl.NumTokens(); i++ )
{
    printf( "%s\n", spl.GetToken( i ) );
}

output:

Item1

Item2
Item3

boost::tokenizer è il tuo amico, ma prendere in considerazione di fare la portabilità del codice con riferimento all'internazionalizzazione (i18n) problemi mediante l'utilizzo di wstring/wchar_t invece di eredità string/char tipi di.

#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>

using namespace std;
using namespace boost;

typedef tokenizer<char_separator<wchar_t>,
                  wstring::const_iterator, wstring> Tok;

int main()
{
  wstring s;
  while (getline(wcin, s)) {
    char_separator<wchar_t> sep(L" "); // list of separator characters
    Tok tok(s, sep);
    for (Tok::iterator beg = tok.begin(); beg != tok.end(); ++beg) {
      wcout << *beg << L"\t"; // output (or store in vector)
    }
    wcout << L"\n";
  }
  return 0;
}

Semplice codice C++ (standard C++98), accetta più i delimitatori (specificato in una std::string), utilizza solo i vettori, stringhe e gli iteratori.

#include <iostream>
#include <vector>
#include <string>
#include <stdexcept> 

std::vector<std::string> 
split(const std::string& str, const std::string& delim){
    std::vector<std::string> result;
    if (str.empty())
        throw std::runtime_error("Can not tokenize an empty string!");
    std::string::const_iterator begin, str_it;
    begin = str_it = str.begin(); 
    do {
        while (delim.find(*str_it) == std::string::npos && str_it != str.end())
            str_it++; // find the position of the first delimiter in str
        std::string token = std::string(begin, str_it); // grab the token
        if (!token.empty()) // empty token only when str starts with a delimiter
            result.push_back(token); // push the token into a vector<string>
        while (delim.find(*str_it) != std::string::npos && str_it != str.end())
            str_it++; // ignore the additional consecutive delimiters
        begin = str_it; // process the remaining tokens
        } while (str_it != str.end());
    return result;
}

int main() {
    std::string test_string = ".this is.a.../.simple;;test;;;END";
    std::string delim = "; ./"; // string containing the delimiters
    std::vector<std::string> tokens = split(test_string, delim);           
    for (std::vector<std::string>::const_iterator it = tokens.begin(); 
        it != tokens.end(); it++)
            std::cout << *it << std::endl;
}

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow