Existe uma maneira fácil de converter HTML com várias tags
em adequadas circundantes tags
em JavaScript?

https://stackoverflow.com/questions/1275250

16-09-2019
|

Pergunta

Vamos dizer que eu tenho um monte de HTML como abaixo:

bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>

Existe uma maneira fácil com o Javascript para convertê-lo para marcas  propriamente semântico? Por exemplo:.

<p>
  bla bla bla long paragraph here
</p>
<p>
  bla bla bla more paragraph text
</p>

espaçamento de saída não é importante, o ideal é que vai funcionar com qualquer espaçamento de entrada.

Eu estou pensando que eu poderia tentar preparar um regex, mas antes de eu fazer o que eu queria ter certeza que eu estava a) evitar um mundo de dor e b) não havia outra coisa lá fora - eu tentou fazer uma pesquisa no google, mas ainda não chegar a nada.

Obrigado por qualquer conselho!

Solução

eu fiquei entediado. Estou certo de que há otimizações / ajustes necessários. Usa um pouco de jQuery para fazer a sua magia. Trabalhou em FF3. E a resposta para sua pergunta é que lá não é uma maneira muito "simples":)

$(function() {
  $.fn.pmaker = function() {
    var brs = 0;
    var nodes = [];

    function makeP()
    {
      // only bother doing this if we have nodes to stick into a P
      if (nodes.length) {
        var p = $("<p/>");
        p.insertBefore(nodes[0]);  // insert a new P before the content
        p.append(nodes); // add the children        
        nodes = [];
      }
      brs=0;
    }

    this.contents().each(function() {    
      if (this.nodeType == 3) // text node 
      {
        // if the text has non whitespace - reset the BR counter
        if (/\S+/.test(this.data)) {
          nodes.push(this);
          brs = 0;
        }
      } else if (this.nodeType == 1) {
        if (/br/i.test(this.tagName)) {
          if (++brs == 2) {
            $(this).remove(); // remove this BR from the dom
            $(nodes.pop()).remove(); // delete the previous BR from the array and the DOM
            makeP();
          } else {
            nodes.push(this);
          }
        } else if (/^(?:p)$/i.test(this.tagName)) {
          // these tags for the P break but dont scan within
          makeP();
        } else if (/^(?:div)$/i.test(this.tagName)) {
          // force a P break and scan within
          makeP();
          $(this).pmaker();
        } else {
          brs = 0; // some other tag - reset brs.
          nodes.push(this); // add the node 
          // specific nodes to not peek inside of - inline tags
          if (!(/^(?:b|i|strong|em|span|u)$/i.test(this.tagName))) {
            $(this).pmaker(); // peek inside for P needs            
          }
        } 
      } 
    });
    while ((brs--)>0) { // remove any extra BR's at the end
      $(nodes.pop()).remove();
    }
    makeP();
    return this;
  };

  // run it against something:
  $(function(){ 
    $("#worker").pmaker();
  });

E esta foi a parte html Eu testei contra:

<div id="worker">
bla bla bla long <b>paragraph</b> here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>
this text should end up in a P
<div class='test'>
  and so should this
  <br/>
  <br/>
  and this<br/>without breaking at the single BR
</div>
and then we have the a "buggy" clause
<p>
  fear the real P!
</p>
and a trailing br<br/>
</div>

E o resultado:

<div id="worker"><p>
bla bla bla long <b>paragraph</b> here
</p>
<p>
bla bla bla more paragraph text
</p>
<p>
this text should end up in a P
</p><div class="test"><p>
  and so should this
  </p>
  <p>
  and this<br/>without breaking at the single BR
</p></div><p>
and then we have the a "buggy" clause
</p><p>
  fear the real P!
</p><p>
and a trailing br</p>
</div>

Outras dicas

Digitalizar cada um dos elementos filhos + texto do elemento delimitador. Cada vez que você encontrar um elemento de "br", criar um elemento "p", e acrescentar todo o material pendente a ele. Ensaboe, enxague, repita.

Não se esqueça de remover o material que você está mudando para um novo elemento de "p".

Eu encontrei esta biblioteca (prototype.js) para ser útil para este tipo de coisa.

Eu estou supondo que você não está realmente permitindo que qualquer outra Às vezes você precisa para preservar quebras de linha simples (nem todos os elementos   são ruins), e você só quer transformar casos duplas de   em quebras de parágrafo.

Ao fazê-lo eu o faria:

Remover todas as quebras de linha
Enrole todo o lote em um parágrafo
Substitua   com \n
Por último, remover quaisquer elementos  vazios que poderiam ter sido gerados

Assim, o código poderia ser algo como:

var ConvertToParagraphs = function(text) {
    var lineBreaksRemoved = text.replace(/\n/g, "");
    var wrappedInParagraphs = "<p>" + lineBreaksRemoved + "</p>";
    var brsRemoved = wrappedInParagraphs.replace(/<br[^>]*>[\s]*<br[^>]*>/gi, "</p>\n<p>");
    var emptyParagraphsRemoved = brsRemoved.replace(/<p><\/p>/g, "");
    return emptyParagraphsRemoved;
}

Nota:. eu estive extremamente detalhado para mostrar os processos, você simplificá-lo, é claro

Isto transforma sua amostra:

bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>

Em:

<p>bla bla bla long paragraph here</p>
<p>bla bla bla more paragraph text</p>

Mas ele faz isso sem remover quaisquer elementos   que você pode realmente quer.

Eu faria isso em várias etapas:

RegExp:. Convert ALL BR-tags para quebras de linha
RegExp:. Faixa de fora todo o white-space
RegExp:. Convert as múltiplas quebras de linha para os solteiros
Use Array.split ( '\ n') sobre o resultado.

Isso deve dar um array com todos os parágrafos 'reais' (em teoria). Então você pode apenas iterate através dele e enrole cada linha em P-tags.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow

Existe uma maneira fácil de converter HTML com várias tags em adequadas circundantes tags em JavaScript?

Existe uma maneira fácil de converter HTML com várias tags
em adequadas circundantes tags
em JavaScript?