Monday 19 October 2015

A couple of options for server side word doc generation

Traditionally, generating Word documents on the server has involved installing Word on the server and using com interop to generate the documents.  This is unsupported and has licensing issues.  Many have tried and got it to work though.

Newer versions of word are xml based, 2013 using Office Open XML (OOXML), an XML based format developed my Microsoft.

Challenge

Client was using automation of a Office 7 client running as a logged in user to generate documents from templates.  The client had upgraded their office to 2013 so was looking for a way to generate as a proper service.

Simple Substitution

The first option was to do simple string substitution.  A quick Google (other search engines are available) found this soltion. “Use OpenXML to create a Word document from a docx template”.

Downloaded and installed the Open XML sdk and toolkit.

First thing I did was try and convert the existing word document.  So I opened it with Word 2013 and saved as docx.  I was suprised that I couldn’t open it with visual studio – expecting a big lump of xml, but it turns out that the docx is in fact a zipped directory of files – which contain the xml.

You can view the internal files by renaming the docx to zip and unzipping.

public static void SearchAndReplace(string document,
            String output,
            Dictionary<string, string> dict)
        {
            // note - the file is copied first as
            // docx is actually a zip of loads of files.
            // for production  would want to delete on failure.
            File.Copy(document, output);
            document = output;
            using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
            {
                string docText = null;
                using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
                {
                    docText = sr.ReadToEnd();
                }

                foreach (KeyValuePair<string, string> item in dict)
                {
                    Regex regexText = new Regex(item.Key);
                    docText = regexText.Replace(docText, item.Value);
                }

                using (StreamWriter sw = new StreamWriter(
                   wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
                {
                    sw.Write(docText);
                }
            }
        }

I pretty much copied the code from above into a utility class. The only change I made was to introduce a destination file.  You’ll notice I copy source to destination.  Initially I tried to just writing the amended data – but this is just one file from the entire docx structure.  For production I would use an intermediate location to ensure onward processing does not pick up a file that has not actually succeeded or yet had the substitutions done.

When I ran the test it complained about the document format.

So I tried to open the docx using the “Open XML productivity Tool” – it also complained.   So I created an empty document and copied and pasted the contents – test code below..

//TODO work out how to read this from config in a test.
        private static string routeInputDir=@"C:\DocService\TEMPLATES";
        private static string routeOutputDir = @"C:\DocService\TEMPLATES";
       
        /// <summary>
        /// This tests that the document was generated using substitution.
        /// </summary>
        [TestMethod]
        public void BasicTestForm63ABySubstitution()
        {
            String inputFile = String.Format("{0}\\{1}", routeInputDir, "FORM 63A.docx");
            String outputFile = String.Format("{0}\\{1}", routeOutputDir, "My document test out.docx");
            WordDocumentGenerator.SearchAndReplace(inputFile,
                outputFile,
                GetSubstDict());

        }

        private static System.Collections.Generic.Dictionary<string, string> GetSubstDict()
        {
            return new System.Collections.Generic.Dictionary<string, string>{
                {"\\[Lands of\\]","Stuart McLean"},
                {"\\[Registered Owner\\]","Stuart "}
                };
        }

 

Quite simply – it worked.  As you can see I had to escape character the regex a bit.

Using the Open XML Productivity tool.

I opened the document using the tool and pressed the “reflect code” and took the code into a new class.

 

image

 

Added a member for the substitution paramaters and changed the CreatePackage method to take as an extra parameter.

 

private Dictionary<string, string> m_dict;

      // Creates a WordprocessingDocument.
      public void CreatePackage(string filePath,
          Dictionary<string, string> dict)
      {
          m_dict = dict;
          using (WordprocessingDocument package = WordprocessingDocument.Create(filePath, WordprocessingDocumentType.Document))
          {
              CreateParts(package);
          }
      }

I then did a search for the text I wanted to replace and set to the relevant dictionary entry.

text7.Text = "Registered Owner";

Becomes

text7.Text = m_dict["Registered Owner"];

Test then simply creates the class and calls the method -

private static System.Collections.Generic.Dictionary<string, string> GetSubstDict()
        {
            return new System.Collections.Generic.Dictionary<string, string>{
                {"\\[Lands of\\]","Stuart McLean"},
                {"Registered Owner","Stuart "},
                {"\\[Name\\]","Jo Bloggs"}
                };
        }

        /// <summary>
        /// This tests that the document was generated.
        /// </summary>
        [TestMethod]
        public void Form63AByGenerated()
        {
            Form63A form = new Form63A();
            String outputFile = String.Format("{0}\\{1}", routeOutputDir, "documentgenout.docx");

         form.CreatePackage(outputFile,
             GetSubstDict()
             );
        }

Conclusion

Both methods are pretty simple. 

Simple substitution is probably easier to maintain if the templates are changing frequently but may have issues with regexes and complex substitutions (tables, images etc.).

Generated code is harder to maintain if the templates change frequently but will be able to handle complex substitutions.  Also, much of the content is common (styles etc.) so this could probably be hived off to a common class to make this easier to maintain.

No comments:

Post a Comment