Hoe kan ek sit 'n Docx na HTML met behulp van asp.net?

https://stackoverflow.com/questions/55113

09-06-2019
|

Vra

Word 2007 slaan sy dokumente in Docx formaat wat eintlik 'n zip-lêer met 'n klomp van die dinge in dit, insluitend 'n XML-lêer met die dokument.

Ek wil in staat wees om 'n Docx lêer neem en gooi dit in 'n gids in my asp.net web artikels en het die kode die Docx lêer oop te maak en lewer die (xml deel van die) dokument as 'n web bladsy .

Ek het al soek op die web vir meer inligting oor hierdie, maar tot dusver het nie veel gevind. My vrae is:

Wil jy (a) gebruik XSLT om die XML transformeer na HTML, of (b) gebruik xml manipulasie biblioteke in NET (soos XDocument en XElement in 3.5) om te skakel na HTML of (c) ander?
Weet jy van enige open source biblioteke / projekte wat hierdie wat ek kon gebruik as 'n beginpunt gedoen het?

Dankie!

Oplossing

Probeer hierdie post ? Ek weet nie, maar kan wees wat jy is op soek na.

Ander wenke

Ek het mammoth.js , wat is 'n JavaScript-biblioteek wat Docx lêers omskakel na HTML. As jy wil hê dat die lewering bediener-kant in NET doen, daar is ook 'n NET weergawe van Mammoth beskikbaar op NuGet .

Mammoth probeer om skoon HTML produseer deur te kyk na semantiese inligting - byvoorbeeld, kartering paragraaf style in Word (soos Heading 1) tot op geskikte etikette en styl in HTML / CSS (soos <h1>). As jy iets wat 'n presiese visuele afskrif produseer wil, dan Mammoth is waarskynlik nie vir jou. As jy iets wat reeds goed gestruktureerde en wil om te sit wat om netjies HTML, kan Mammoth die truuk doen.

Word 2007 het 'n API wat jy kan gebruik om te skakel na HTML. Hier is 'n boodskap wat praat oor dit http://msdn.microsoft.com/en -us / tydskrif / cc163526.aspx . Jy kan dokumentasie rondom die API te vind, maar ek onthou dat daar 'n skakel na HTML funksie in die API.

Hierdie kode sal help om .docx lêer om te skakel na die teks

function read_file_docx($filename){

    $striped_content = '';
    $content = '';

    if(!$filename || !file_exists($filename)) { echo "sucess";}else{ echo "not sucess";}

    $zip = zip_open($filename);

    if (!$zip || is_numeric($zip)) return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

        if (zip_entry_name($zip_entry) != "word/document.xml") continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

        zip_entry_close($zip_entry);
    }// end while

    zip_close($zip);

    //echo $content;
    //echo "<hr>";
    //file_put_contents('1.xml', $content);     

    $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
    $content = str_replace('</w:r></w:p>', "\r\n", $content);
     //header("Content-Type: plain/text");


    $striped_content = strip_tags($content);


      $striped_content = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/","",$striped_content);

    echo nl2br($striped_content); 
}

Ek gebruik Interop. Dit is 'n bietjie problamatic maar werk goed in die meeste van die geval.

using System.Runtime.InteropServices;
using Microsoft.Office.Interop.Word;

Hierdie een gee die lys van pad html omskep dokumente '

public List<string> GetHelpDocuments()
    {

        List<string> lstHtmlDocuments = new List<string>();
        foreach (string _sourceFilePath in Directory.GetFiles(""))
        {
            string[] validextentions = { ".doc", ".docx" };
            if (validextentions.Contains(System.IO.Path.GetExtension(_sourceFilePath)))
            {
                sourceFilePath = _sourceFilePath;
                destinationFilePath = _sourceFilePath.Replace(System.IO.Path.GetExtension(_sourceFilePath), ".html");
                if (System.IO.File.Exists(sourceFilePath))
                {
                    //checking if the HTML format of the file already exists. if it does then is it the latest one?
                    if (System.IO.File.Exists(destinationFilePath))
                    {
                        if (System.IO.File.GetCreationTime(destinationFilePath) != System.IO.File.GetCreationTime(sourceFilePath))
                        {
                            System.IO.File.Delete(destinationFilePath);
                            ConvertToHTML();
                        }
                    }
                    else
                    {
                        ConvertToHTML();
                    }

                    lstHtmlDocuments.Add(destinationFilePath);
                }
            }


        }
        return lstHtmlDocuments;
    }

En hierdie een te doc skakel na HTML.

private void ConvertToHtml()
    {
        IsError = false;
        if (System.IO.File.Exists(sourceFilePath))
        {
            Microsoft.Office.Interop.Word.Application docApp = null;
            string strExtension = System.IO.Path.GetExtension(sourceFilePath);
            try
            {
                docApp = new Microsoft.Office.Interop.Word.Application();
                docApp.Visible = true;

                docApp.DisplayAlerts = WdAlertLevel.wdAlertsNone;
                object fileFormat = WdSaveFormat.wdFormatHTML;
                docApp.Application.Visible = true;
                var doc = docApp.Documents.Open(sourceFilePath);
                doc.SaveAs2(destinationFilePath, fileFormat);
            }
            catch
            {
                IsError = true;
            }
            finally
            {
                try
                {
                    docApp.Quit(SaveChanges: false);

                }
                catch { }
                finally
                {
                    Process[] wProcess = Process.GetProcessesByName("WINWORD");
                    foreach (Process p in wProcess)
                    {
                        p.Kill();
                    }
                }
                Marshal.ReleaseComObject(docApp);
                docApp = null;
                GC.Collect();
            }
        }
    }

Die dood van die woord is nie pret nie, maar kan nie toelaat dat dit daar hang en blok ander, reg?

In die web / html i lewer html na 'n iframe.

Daar is 'n dropdown wat die lys van hulp dokumente bevat. Waarde is die pad na die HTML-weergawe van dit en teks is die naam van die dokument.

private void BindHelpContents()
    {
        List<string> lstHelpDocuments = new List<string>();
        HelpDocuments hDoc = new HelpDocuments(Server.MapPath("~/HelpDocx/docx/"));
        lstHelpDocuments = hDoc.GetHelpDocuments();
        int index = 1;
        ddlHelpDocuments.Items.Insert(0, new ListItem { Value = "0", Text = "---Select Document---", Selected = true });

        foreach (string strHelpDocument in lstHelpDocuments)
        {
            ddlHelpDocuments.Items.Insert(index, new ListItem { Value = strHelpDocument, Text = strHelpDocument.Split('\\')[strHelpDocument.Split('\\').Length - 1].Replace(".html", "") });
            index++;
        }
        FetchDocuments();

    }

op uitgesoekte indeks verander, is dit renedred te raam

    protected void RenderHelpContents(object sender, EventArgs e)
    {
        try
        {
            if (ddlHelpDocuments.SelectedValue == "0") return;
            string strHtml = ddlHelpDocuments.SelectedValue;
            string newaspxpage = strHtml.Replace(Server.MapPath("~/"), "~/");
            string pageVirtualPath = VirtualPathUtility.ToAbsolute(newaspxpage);// 
            documentholder.Attributes["src"] = pageVirtualPath;
        }
        catch
        {
            lblGError.Text = "Selected document doesn't exist, please refresh the page and try again. If that doesn't help, please contact Support";
        }
    }

Gelisensieer onder: CC-BY-SA met toeskrywing

Nie verbonde aan StackOverflow