Content Import

On this page

EmDash’s import system uses a pluggable source architecture. Each source knows how to probe, analyze, and fetch content from a specific platform.

Import Sources

Source IDPlatformProbeOAuthFull Import
wxrWordPress export fileNoNoYes
wordpress-comWordPress.comYesYesYes
wordpress-restSelf-hosted WordPressYesNoProbe only

WXR File Upload

The most complete import method. Upload a WordPress eXtended RSS (WXR) export file directly to the admin dashboard.

Capabilities:

  • All post types (including custom)
  • All meta fields
  • Drafts and private posts
  • Full taxonomy hierarchy
  • Media attachment metadata

How to get a WXR file:

  1. In WordPress admin, go to Tools → Export
  2. Select All content or specific post types
  3. Click Download Export File
  4. Upload the .xml file to EmDash

WordPress.com OAuth

For sites hosted on WordPress.com, connect via OAuth to import without manual file exports.

  1. Enter your WordPress.com site URL
  2. Click Connect with WordPress.com
  3. Authorize EmDash in the WordPress.com popup
  4. Select content to import

What’s included:

  • Published and draft content
  • Private posts (with authorization)
  • Media files via API
  • Custom fields exposed to REST API

WordPress REST API Probe

When you enter a URL, EmDash probes the site to detect WordPress and show available content:

Detected: WordPress 6.4
├── Posts: 127 (published)
├── Pages: 12 (published)
└── Media: 89 files

Note: Drafts and private content require authentication
or a full WXR export.

The REST probe is informational. For complete imports, it suggests uploading a WXR file or connecting via OAuth (for WordPress.com).

Import Flow

All sources follow the same flow:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Connect   │────▶│   Analyze   │────▶│   Prepare   │────▶│   Execute   │
│  (probe/    │     │  (schema    │     │  (create    │     │  (import    │
│   upload)   │     │   check)    │     │   schema)   │     │   content)  │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Step 1: Connect

Enter a URL to probe or upload a file directly.

URL probing runs all registered sources in parallel. The highest-confidence match determines the suggested next action:

  • WordPress.com site → Offer OAuth connection
  • Self-hosted WordPress → Show export instructions
  • Unknown → Suggest file upload

Step 2: Analyze

The source parses content and checks schema compatibility:

Post Types:
├── post (127) → posts [New collection]
├── page (12)  → pages [Existing, compatible]
├── product (45) → products [Add 3 fields]
└── revision (234) → [Skip - internal type]

Required Schema Changes:
├── Create collection: posts
├── Add fields to pages: featured_image
└── Create collection: products

Each post type shows its status:

StatusMeaning
ReadyCollection exists with compatible fields
New collectionWill be created automatically
Add fieldsCollection exists, missing fields added
IncompatibleField type conflicts (manual fix needed)

Step 3: Prepare Schema

Click Create Schema & Import to:

  1. Create new collections via SchemaRegistry
  2. Add missing fields with correct column types
  3. Set up content tables with indexes

Step 4: Execute Import

Content imports sequentially:

  • Gutenberg/HTML converted to Portable Text
  • WordPress status mapped to EmDash status
  • WordPress authors mapped to ownership (authorId) and presentation bylines
  • Taxonomies created and linked
  • Reusable blocks (wp_block) imported as Sections
  • Progress shown in real-time

Author import behavior:

  • If an author mapping points to an EmDash user, ownership is set to that user and a linked byline is created/reused for the same user.
  • If there is no user mapping, a guest byline is created/reused from the WordPress author identity.
  • Imported entries get ordered byline credits, with the first credit set as primaryBylineId.

Step 5: Media Import (Optional)

After content, optionally import media:

  1. Analysis — Shows attachment counts by type

    Media found:
    ├── Images: 75 files
    ├── Video: 10 files
    └── Other: 4 files
  2. Download — Streams from WordPress URLs with progress

    Importing media...
    ├── 45 of 89 (50%)
    ├── Current: vacation-photo.jpg
    └── Status: Uploading
  3. Rewrite URLs — Content automatically updated with new URLs

Media import uses content hashing (xxHash64) for deduplication. The same image used in multiple posts is stored once.

Source Interface

Import sources implement a standard interface:

interface ImportSource {
	/** Unique identifier */
	id: string;

	/** Display name */
	name: string;

	/** Probe a URL (optional) */
	probe?(url: string): Promise<SourceProbeResult | null>;

	/** Analyze content from this source */
	analyze(input: SourceInput, context: ImportContext): Promise<ImportAnalysis>;

	/** Stream content items */
	fetchContent(input: SourceInput, options: FetchOptions): AsyncGenerator<NormalizedItem>;
}

Input Types

Sources accept different input types:

// File upload (WXR)
{ type: "file", file: File }

// URL with optional token (REST API)
{ type: "url", url: string, token?: string }

// OAuth connection (WordPress.com)
{ type: "oauth", url: string, accessToken: string }

Normalized Output

All sources produce the same normalized format:

interface NormalizedItem {
	sourceId: string | number;
	postType: string;
	status: "publish" | "draft" | "pending" | "private" | "future";
	slug: string;
	title: string;
	content: PortableTextBlock[];
	excerpt?: string;
	date: Date;
	author?: string;
	authors?: string[];
	categories?: string[];
	tags?: string[];
	meta?: Record<string, unknown>;
	featuredImage?: string;
}

API Endpoints

The import system exposes these endpoints:

Probe URL

POST /_emdash/api/import/probe
Content-Type: application/json

{ "url": "https://example.com" }

Returns detected platform and suggested action.

Analyze WXR

POST /_emdash/api/import/wordpress/analyze
Content-Type: multipart/form-data

file: [WordPress export .xml]

Returns post type analysis with schema compatibility.

Prepare Schema

POST /_emdash/api/import/wordpress/prepare
Content-Type: application/json

{
  "postTypes": [
    { "name": "post", "collection": "posts", "enabled": true }
  ]
}

Creates collections and fields.

Execute Import

POST /_emdash/api/import/wordpress/execute
Content-Type: multipart/form-data

file: [WordPress export .xml]
config: { "postTypeMappings": { "post": { "collection": "posts" } } }

Imports content to specified collections.

Import Media

POST /_emdash/api/import/wordpress/media
Content-Type: application/json

{
  "attachments": [{ "id": 123, "url": "https://..." }],
  "stream": true
}

Streams NDJSON progress updates during download/upload.

Rewrite URLs

POST /_emdash/api/import/wordpress/rewrite-urls
Content-Type: application/json

{
  "urlMap": { "https://old.com/image.jpg": "/_emdash/media/abc123" }
}

Updates Portable Text content with new media URLs.

Error Handling

Recoverable Errors

  • Network timeout — Retried with backoff
  • Single item parse failure — Logged, skipped, import continues
  • Media download failure — Marked for manual handling

Fatal Errors

  • Invalid file format — Import stops with error message
  • Database connection lost — Import pauses, allows resume
  • Storage quota exceeded — Import stops, shows usage

Error Report

After import:

Import Complete

✓ 125 posts imported
✓ 12 pages imported
✓ 85 media references recorded

⚠ 2 items had warnings:
  - Post "Special Characters ñ" - title encoding fixed
  - Page "About" - duplicate slug renamed to "about-1"

✗ 1 item failed:
  - Post ID 456 - content parsing error (saved as draft)

Failed items are saved as drafts with original content in _importError for review.

Building Custom Sources

Create a source for other platforms:

import type { ImportSource } from "emdash/import";

export const mySource: ImportSource = {
	id: "my-platform",
	name: "My Platform",
	description: "Import from My Platform",
	icon: "globe",
	canProbe: true,

	async probe(url) {
		// Check if URL matches your platform
		const response = await fetch(`${url}/api/info`);
		if (!response.ok) return null;

		return {
			sourceId: "my-platform",
			confidence: "definite",
			detected: { platform: "my-platform" },
			// ...
		};
	},

	async analyze(input, context) {
		// Parse and analyze content
		// Return ImportAnalysis
	},

	async *fetchContent(input, options) {
		// Yield NormalizedItem for each content piece
		for (const item of items) {
			yield {
				sourceId: item.id,
				postType: "post",
				title: item.title,
				content: convertToPortableText(item.body),
				// ...
			};
		}
	},
};

Register the source in your EmDash configuration:

import { mySource } from "./src/import/custom-source";

export default defineConfig({
	integrations: [
		emdash({
			import: {
				sources: [mySource],
			},
		}),
	],
});

Next Steps