SYS: OPERATIONALUPTIME: 99.8%

GUIDES

Data Processing

Use AI agents to extract structured data from unstructured sources like emails, invoices, PDFs, and form submissions. This guide covers schema definition, validation, and batch processing.

Define Your Schema

Start by defining the output schema your agent should extract. Use JSON Schema for strict validation:

Invoice schema
const invoiceSchema = { type: "object", properties: { vendorName: { type: "string" }, invoiceNumber: { type: "string" }, date: { type: "string", format: "date" }, dueDate: { type: "string", format: "date" }, lineItems: { type: "array", items: { type: "object", properties: { description: { type: "string" }, quantity: { type: "number" }, unitPrice: { type: "number" }, total: { type: "number" }, }, required: ["description", "quantity", "unitPrice", "total"], }, }, subtotal: { type: "number" }, tax: { type: "number" }, total: { type: "number" }, currency: { type: "string", enum: ["USD", "EUR", "GBP"] }, }, required: ["vendorName", "invoiceNumber", "date", "total"], };

Create the Extraction Agent

Data extraction agent
const extractor = await client.agents.create({ name: "Invoice Extractor", model: "claude-sonnet-4-20250514", systemPrompt: `You are a data extraction agent. Extract structured data from the provided document according to the schema. Rules: - Extract ALL fields present in the document - Use null for fields that cannot be determined - Flag any ambiguous values with a confidence score - Validate numbers: line item totals should equal qty * unit price - Dates should be in ISO 8601 format (YYYY-MM-DD) - Always return valid JSON matching the schema`, tools: [ { name: "validate_schema", description: "Validate extracted data against the expected schema", parameters: { type: "object", properties: { data: { type: "object" }, schemaName: { type: "string" }, }, required: ["data", "schemaName"], }, handler: "https://api.example.com/webhooks/validate", }, { name: "save_record", description: "Save the validated record to the database", parameters: { type: "object", properties: { data: { type: "object" }, collection: { type: "string" }, }, required: ["data", "collection"], }, handler: "https://api.example.com/webhooks/records/save", }, ], parameters: { temperature: 0.1, maxTokens: 2048 }, });

Process Documents

Single document
const result = await client.agents.execute(extractor.id, { input: `Extract data from this invoice: INVOICE #INV-2026-0342 From: Acme Supplies LLC Date: March 10, 2026 Due: April 10, 2026 Items: - Widget A (x100) @ $12.50 each = $1,250.00 - Widget B (x50) @ $8.00 each = $400.00 - Shipping = $45.00 Subtotal: $1,695.00 Tax (8.5%): $144.08 Total: $1,839.08 USD`, }); // Agent extracts structured JSON and validates it console.log(result.output);

Batch Processing

Batch processing
async function processInvoiceBatch(documents: string[]) { const results = []; for (const doc of documents) { const result = await client.agents.execute(extractor.id, { input: `Extract invoice data:\n\n${doc}`, }); // Parse the structured output try { const data = JSON.parse(result.output); results.push({ status: "success", data }); } catch { results.push({ status: "error", raw: result.output, error: "Failed to parse structured output", }); } } return results; } // Process 100 invoices const invoices = await fetchPendingInvoices(); const extracted = await processInvoiceBatch(invoices); console.log(`Processed: ${extracted.length}`); console.log(`Success: ${extracted.filter(r => r.status === "success").length}`);

Note

For high-volume batch processing, use parallel execution with concurrency limits to stay within rate limits. The SDK handles rate limiting automatically, but parallel requests improve throughput.

Validation Patterns

Validation
// Add validation to your extraction pipeline async function extractAndValidate(document: string) { const result = await client.agents.execute(extractor.id, { input: `Extract and validate this invoice data. After extraction, call validate_schema to verify. If validation fails, fix the issues and re-validate. Finally, call save_record to persist the result. Document: ${document}`, }); // Check if validation passed const validationCall = result.toolCalls.find( (tc) => tc.name === "validate_schema" ); return { output: result.output, validated: validationCall?.result?.valid === true, saved: result.toolCalls.some((tc) => tc.name === "save_record"), }; }

Use Cases

  • -Invoice processing — Extract vendor, line items, totals, and dates
  • -Resume parsing — Extract skills, experience, education into structured profiles
  • -Email triage — Classify, extract action items, and route to the right team
  • -Contract analysis — Extract key terms, dates, obligations, and risks
  • -Survey responses — Categorize free-text answers into structured themes