Extract Microsoft Word Images Programmatically


By rashid sarwar
Printer Friendly Version
  

With the help of this tutorial one can easily programmatically extract images from microsoft word document into a folder using C# .net component Aspose Words



How to manipulate Word images programmatically

 

Working with micorsoft word images programmatically was never so easy. Now with the availability of Aspose.Words .Net component, it has become much easier to manipulate
the images in a Word DOC document.

 

I came across a problem in my project in which I had a user given input word Doc file and I had to extract all the images in the DOC document and save them as files in a folder. The basic purpose of the activity was to get instant access to all the images contained in DOC document without opening the DOC document using Microsoft word and user could open the images using any image viewer software available so that he could play with the images.

 

First problem was that I had to extract all the images contained in the document. I had to use the component using the C# language for this purpose. So all the code discussed here will be in C# syntax.  

 

Step 1: Create a new C# project


To create a new project, choose the main menu : File > New > Project.

It will give you several options. First you must select a type from the left side of the popup - you must choose Visual Basic Projects or Visual C# projects based on the language you plan to use for development. But here I am using C# so you should choose Visual C# projects.

After selecting a type, you choose a template from the right side. You may choose Windows Application, ASP.NET Web Application or any other template based on the nature of the application you want. I have used ConsoleApplication template for this tutorial so you also select ConsoleApplication template type. 

When you create a ConsoleApplication template project, VS.NET will add a sample file by default. You can simply Build your new project.


Step 2 :  Add a reference to Aspose.Words Assembly in Project

 The Add Reference dialog box can be used to add project references. This dialog box can be accessed from the Project menu.

 

To add Aspose.Words project reference

  1. In Solution Explorer, select the project
  2. On the Project menu, choose Add Reference.

The Add Reference dialog box opens.

  1. Select the tab indicating the Aspose.Words component in .Net pane
  2. Click OK when you have selected the component of Aspose.Words

Selected reference of Aspose.Words will now appear under the References node of the project

 

 

Step 3: Open an existing DOC document

To open a document the Aspose.Words library contains a Document class that is central to the library. This Document object allows loading documents in many formats. The file format I had to read was Microsoft Office DOC format. I passed the filename concatenated with the file path into the constructor of the Document object using a String var ImageFilePath . I had to add following line of code to read the file.

//open an existing DOC document using the Document object class

string ImageFilePath  = "c:\\imagefolder";

Document doc = new Document(ImageFilePath + "\\ImageFile.doc");


Step 4 :Access to Images in Document 

Now I had to access the images contained in the doc object. The Document object follows Microsoft DOM model so accessing the images in the document was fairly easy by getting the collection of Nodes from Document tree calling the GetChildNodes method and asking it to provide the nodes of shape type. The class NodeCollection is represents a collection of nodes of a specific type.

 

//It gives a collection of all shape nodes in the tree

NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true, false);


Step 5: Iterate through Node Collections

Now I had to iterate through the node collections array. Here is the code for doing it.

int imageIndex = 0;

foreach (Shape shape in shapes)

{

if (shape.HasImage)

      {

String name = "DocumentImage" + "_" + imageIndex.ToString() + ".bmp";

      shape.ImageData.Save(ImageFilePath +"\\"+ name);

            imageIndex++;

      }

}

After executing the code, I could see all the image files in the folder “c:\ imagefolder”.

Folder contents before executing the code are as shown in figure accessible by this zip file available in this link. figure1

Folder contents after executing the code are as shown in figure accessible by this zip file available in this link figure2

 Here is the code available defined in Class1.cs file

using System;
using Aspose.Words;
using Aspose.Words.Drawing;

namespace ConsoleApplication1
{
class Class1
{
[STAThread]
static void Main(string[] args)
{
//open an existing DOC document using the Document object class string ImageFilePath = "c:\\imagefolder";
Document doc = new Document(ImageFilePath + "\\ImageFile.doc");
//It gives a collection of all shape nodes in the tree NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true, false);
int imageIndex = 0;

foreach (Shape shape in shapes)
{
if (shape.HasImage)
{
String name = "DocumentImage" + "_" + imageIndex.ToString() + ".bmp";
shape.ImageData.Save(ImageFilePath +"\\"+ name);
imageIndex++;
}
}


}
}
}



button
 
Article Discussion: How to manipulate Word images programmatically
  rashid sarwar posted at 28-Jan-08 06:03
Original Article

 
  .NET version
  Muhammad Shakir replied to rashid sarwar at 16-May-08 02:21
Aslam-o-Alikum Rashid
I am c# student learning C# 2. I have installed .net 2, coudn't find that reference you asked to add. Is it in .net version 3?
Regards