Of course! Converting Word documents (.docx) to HTML is a common requirement for web applications. Here’s a comprehensive guide on how to do this in Java, covering the most popular and effective libraries.

Summary of Approaches
| Library | How it Works | Pros | Cons | Best For |
|---|---|---|---|---|
| Apache POI | Low-level API that reads the Word file's XML structure. | Full control, no external dependencies. | Very complex and verbose. You have to handle everything manually. | Developers who need maximum control and are willing to write a lot of code. |
| docx4j | High-level library that builds on Apache POI. Much easier to use. | Excellent support for complex Word features (headers, footers, tables, images). | Can be heavy. The free version is powerful, but commercial support is available. | Most use cases. The recommended choice for robust, high-fidelity conversions. |
| Freemarker | A templating engine. You design an HTML template and use data extracted from a Word doc to populate it. | Maximum flexibility for the final HTML output. You control the exact structure and styling. | Requires two steps: extract data from Word, then apply it to the template. | Projects where the final HTML must adhere to a very specific, pre-defined structure. |
Approach 1: Using Apache POI (The "Hard Way")
Apache POI is the most famous Java library for Microsoft Office formats. However, its Word processing component (XWPF) is very low-level. Converting to HTML requires you to iterate through every paragraph, run, and style, and manually generate the corresponding HTML tags.
This is not recommended for a quick solution but is good to understand what's happening under the hood.
Example Code (Simplified)
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import java.io.*;
public class PoiToHtmlConverter {
public static void main(String[] args) throws Exception {
// 1. Load the Word document
XWPFDocument document = new XWPFDocument(new FileInputStream("input.docx"));
// 2. Start building the HTML string
StringBuilder htmlBuilder = new StringBuilder();
htmlBuilder.append("<html><head><meta charset=\"UTF-8\"></head><body>");
// 3. Iterate through paragraphs
for (XWPFParagraph p : document.getParagraphs()) {
String alignment = getAlignment(p.getCTP(). getPPr() != null ? p.getCTP().getPPr().getJc() : null);
htmlBuilder.append("<p style=\"text-align: ").append(alignment).append("\">");
// 4. Iterate through runs (text with the same formatting)
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
String bold = r.isBold() ? "font-weight: bold;" : "";
String italic = r.isItalic() ? "font-style: italic;" : "";
String fontSize = r.getFontSize() != -1 ? "font-size: " + r.getFontSize() + "pt;" : "";
htmlBuilder.append("<span style=\"").append(bold).append(italic).append(fontSize).append("\">")
.append(escapeHtml(text))
.append("</span>");
}
htmlBuilder.append("</p>");
}
// 5. Handle tables (this is even more complex)
for (XWPFTable table : document.getTables()) {
htmlBuilder.append("<table border=\"1\">");
for (XWPFTableRow row : table.getRows()) {
htmlBuilder.append("<tr>");
for (XWPFTableCell cell : row.getTableCells()) {
htmlBuilder.append("<td>");
for (XWPFParagraph p : cell.getParagraphs()) {
// Similar logic to the paragraph loop above
htmlBuilder.append(p.getText());
}
htmlBuilder.append("</td>");
}
htmlBuilder.append("</tr>");
}
htmlBuilder.append("</table>");
}
htmlBuilder.append("</body></html>");
// 6. Write the HTML to a file
try (PrintWriter out = new PrintWriter("output_poi.html")) {
out.println(htmlBuilder.toString());
}
System.out.println("Conversion complete. Check output_poi.html");
}
private static String getAlignment(CTJc jc) {
if (jc == null) return "left";
switch (jc.getVal()) {
case CENTER: return "center";
case RIGHT: return "right";
case BOTH: return "justify";
default: return "left";
}
}
private static String escapeHtml(String input) {
return input.replace("&", "&")
.replace("<", "<")
.replace(">", ">")
.replace("\"", """)
.replace("'", "'");
}
}
As you can see, this is a lot of work and doesn't even cover images, headers, footers, or complex styles properly.
Approach 2: Using docx4j (The Recommended Way)
docx4j is designed specifically for this kind of task. It has built-in functionality to convert a Word document to a well-formed HTML string, handling most formatting automatically.

Step 1: Add the Dependency
Add the docx4j library to your project. If you're using Maven, add this to your pom.xml:
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-core</artifactId>
<version>11.4.4</version> <!-- Use the latest version -->
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-export-fo</artifactId>
<version>11.4.4</version> <!-- This dependency is needed for the conversion -->
</dependency>
Step 2: Write the Java Code
The code is remarkably simple.
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.convert.out.html.HtmlExporterNG2;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
public class Docx4jToHtmlConverter {
public static void main(String[] args) throws Exception {
// 1. Load the Word document
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("input.docx"));
// 2. Convert the WordMLPackage to an XHTML string
// The XHTMLImporterImpl handles the conversion
XHTMLImporterImpl xhtmlImporter = new XHTMLImporterImpl(wordMLPackage);
org.w3c.dom.Document htmlDom = xhtmlImporter.convert(wordMLPackage.getMainDocumentPart());
// 3. Use the HtmlExporterNG2 to write the XHTML to a file
// This pretty-prints the HTML and makes it look nice
try (OutputStream os = new FileOutputStream("output_docx4j.html")) {
HtmlExporterNG2 exporter = new HtmlExporterNG2();
exporter.export(htmlDom, os);
}
System.out.println("Conversion complete. Check output_docx4j.html");
}
}
This code will produce a output_docx4j.html file that includes styles as inline CSS, preserving the look and feel of the original document much better than the POI example.
Approach 3: Using Freemarker (The Template-Driven Way)
This approach is different. You don't directly convert Word to HTML. Instead, you use a library (like docx4j or Apache POI) to extract data from the Word document, and then use Freemarker to render this data into a pre-defined HTML template.

This is ideal when you need the final HTML to match a specific design (e.g., a corporate website template).
Step 1: Add Dependencies
You'll need docx4j to read the Word file and Freemarker for templating.
<!-- pom.xml -->
<dependencies>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-core</artifactId>
<version>11.4.4</version>
</dependency>
<dependency>
<groupId>org.freemarker</groupId>
<artifactId>freemarker</artifactId>
<version>2.3.32</version>
</dependency>
</dependencies>
Step 2: Create an HTML Template
Create a file named template.ftl in a src/main/resources/templates directory.
<!-- src/main/resources/templates/template.ftl -->
<!DOCTYPE html>
<html>
<head>${document.title}</title>
<style>
body { font-family: sans-serif; }
.content { max-width: 800px; margin: auto; }
</style>
</head>
<body>
<div class="content">
<h1>${document.title}</h1>
<p><em>Generated on: ${.now?string("yyyy-MM-dd HH:mm")}</em></p>
<#list document.paragraphs as para>
<p>${para.text}</p>
</#list>
<#if document.hasTable>
<h2>Data Table</h2>
<table border="1">
<#list document.tableData as row>
<tr>
<#list row as cell>
<td>${cell}</td>
</#list>
</tr>
</#list>
</table>
</#if>
</div>
</body>
</html>
Step 3: Write the Java Code
This code extracts simple data from the Word document and uses Freemarker to fill the template.
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.*;
import freemarker.template.Configuration;
import freemarker.template.Template;
import java.io.File;
import java.io.FileWriter;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class FreemarkerDocxConverter {
public static void main(String[] args) throws Exception {
// 1. Load the document
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("input.docx"));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
// 2. Extract data from the document
Map<String, Object> dataModel = new HashMap<>();
dataModel.put("title", "My Document Report"); // You could extract this from a custom property
List<String> paragraphs = new ArrayList<>();
for (Object o : documentPart.getContent()) {
if (o instanceof P) {
P paragraph = (P) o;
paragraphs.add(getParagraphText(paragraph));
}
}
dataModel.put("paragraphs", paragraphs);
// 3. Configure Freemarker
Configuration cfg = new Configuration(Configuration.VERSION_2_3_32);
cfg.setDirectoryForTemplateLoading(new File("src/main/resources/templates"));
// 4. Process the template
Template template = cfg.getTemplate("template.ftl");
try (Writer out = new FileWriter("output_freemarker.html")) {
template.process(dataModel, out);
}
System.out.println("Conversion complete. Check output_freemarker.html");
}
private static String getParagraphText(P paragraph) {
StringBuilder text = new StringBuilder();
for (Object r : paragraph.getR()) {
R run = (R) r;
for (Object t : run.getT()) {
text.append(((CTText) t).getValue());
}
}
return text.toString();
}
}
Final Recommendation
- For quick and reliable conversions that preserve formatting, use docx4j. It's the most practical and powerful solution for this task.
- If you need complete control over the final HTML structure and styling, and are willing to build a data extraction process, use Freemarker.
- Avoid using Apache POI directly for this conversion unless you have very specific, low-level requirements and are prepared for a significant development effort.
