Skip to main content
Version: 2.0-beta.1
File Extractor node

File Extractor node

File Extractor Node

Overview

  • Type: transform.file.extractor
  • Display Name: File Extractor
  • Category: transform
  • Execution: supportsExecution: true
  • I/O Handles:
    • Input: defaultIn (left)
    • Outputs: result (Success), error (Error)

Purpose: Convert binary file data to JSON by extracting and parsing data from CSV and Excel files with advanced filtering, column selection/mapping, and row manipulation.


Configuration Reference

File Extractor – Basic and format

Basic: format, input field, CSV/Excel specifics

Basic

File Extractor – Columns configuration

Columns: ranges, selection mode, mappings

Columns

File Extractor – Filters and processing

Filters, rows, and processing options

Filters, Rows, Processing

Basic

FieldTypeDefaultDescription
File FormatenumCSVOne of CSV
Input Fieldstringdata.dataDot path to binary/encoded file content from the incoming message.

CSV-Specific

FieldTypeDefaultDescription
Delimiterstring (1 char),Character to separate values. One of , ; \t `

Excel-Specific

FieldTypeDefaultDescription
Sheet Namestring""Target sheet; empty = first sheet.

Rows

FieldTypeDefaultVisible WhenDescription
Range ModeenumautoExcel onlyOne of auto
Start Rownumber (min 1)1CSV always, Excel manualFirst row to read (1-based).
End Rownumber0CSV always, Excel manual0 = all. If non-zero, must be ≥ startRow.
Max Rowsnumber0Always0 = unlimited. Applied after filters.

Notes:

  • Range Mode auto: detects data region automatically (Excel). Only maxRows applies.
  • Range Mode manual: enables startRow, endRow, and column ranges (Excel).
  • Range Mode headerSearch: header row found by text search; see Headers.

Columns

Column Range (CSV and Excel manual mode)

FieldTypeDefaultDescription
Start Columnstring""A/1/AA or header name; empty = first column.
End Columnstring""A/1/AA/header name; 0 or empty = last column.

Column Selection Mode

FieldTypeDefaultDescription
Column ModeenumallOne of all
Columnsarray[]Used when columnMode !== 'all'. Each: { column: string; rename?: string }.
  • Include: only listed columns.
  • Exclude: all except listed.
  • Map: select and optionally rename.

Headers

FieldTypeDefaultVisible WhenDescription
Has HeadersbooleantrueAlwaysFirst row contains header names.
Include HeadersbooleantrueHas HeadersInclude the header row in output metadata.
Normalize HeadersbooleantrueHas HeadersTrim whitespace; handle duplicates with suffix _2, _3, ...
Header Search Textstring""Has HeadersOptional for CSV/Excel auto/manual; required for Excel headerSearch.
Header Search Columnstring""When search text setOptional column limit (A/1/header).

Filters

filterGroups: { op: 'AND'|'OR', conditions: FilterCondition[] }[]

FilterCondition: { column: string, type: FilterType, value: string }

  • String: equals|contains|startsWith|endsWith|matches
  • Numeric: greaterThan|lessThan|greaterOrEqual|lessOrEqual

Notes:

  • All groups combine with AND.
  • Column names are case-insensitive.
  • Missing columns treated as empty strings.
  • matches uses regex; invalid patterns rejected.
  • Numeric types require numeric value (commas allowed; stripped).

Processing

FieldTypeDefaultVisible WhenDescription
Trim WhitespacebooleantrueAlwaysTrim surrounding whitespace per value.
Skip Empty RowsbooleantrueAlwaysSkip rows with no meaningful content.
Preserve TypesbooleantrueExcel onlyKeep numbers/booleans typed; dates as ISO strings.

Settings (Defaults)

FieldTypeDefaultDescription
Retry on FailbooleanfalseRetry on transient errors.
On ErrorenumstopPipelineOne of stopPipeline

Documentation (Defaults)

FieldTypeDefaultDescription
Notesstring""Internal notes.
Display Note in PipelinebooleanfalseShow notes on the canvas.

Output Format

Example output:

{
"success": true,
"rows": [
{ "Name": "John", "Age": 30, "Status": "Active" },
{ "Name": "Jane", "Age": 25, "Status": "Active" }
],
"metadata": {
"totalRows": 100,
"filteredRows": 2,
"returnedRows": 2,
"headers": ["Name", "Age", "Status"],
"format": "csv",
"sheetName": "Sheet1"
}
}

Error Packet

{
"success": false,
"error": "Sheet 'Data' not found in workbook",
"errorType": "SheetNotFound"
}

Common errors: Invalid format; missing file data; invalid base64; invalid file; sheet not found; invalid column reference; header not found; invalid filter.


Validation Rules

  • label: required (non-empty)
  • fileFormat: must be CSV|Excel
  • inputField: required (non-empty)
  • CSV: delimiter required (exactly 1 char)
  • Excel: rangeMode in auto|manual|headerSearch
  • Excel header search: headerSearchText required when rangeMode==='headerSearch'
  • startRow: ≥ 1 when provided; endRow: 0 or ≥ startRow; maxRows: ≥ 0
  • columnMode: one of all|include|exclude|map; columns required when mode ≠ all
  • filterGroups: each group has op in AND|OR and at least one condition; types valid; regex validated; numeric values parseable

Using with Local File "Fetch"

This node consumes file content produced earlier in the pipeline.

  1. Local File connection → Function: Fetch Local File
  2. Connect the function node’s result output to the File Extractor’s defaultIn input.
  3. Ensure inputField targets the file content. Default data.data matches common File Fetch output.

Examples

  • CSV: inputField: 'data.data', fileFormat: 'CSV', delimiter: ',', hasHeaders: true, columnMode: 'include', columns: [{ column: 'Date' }, { column: 'Total' }]
  • Excel: inputField: 'data.data', fileFormat: 'Excel', sheetName: 'Summary', rangeMode: 'headerSearch', headerSearchText: 'Customer ID', columnMode: 'map', columns: [{ column: 'A', rename: 'id' }, { column: 'CustomerName', rename: 'name' }]

Common Use Cases

  • Simple CSV import with headers (all defaults)
  • Excel sheet extraction with mapped columns (use Map mode)
  • Filtered data with row limit (combine filters + maxRows)
  • Excel with variable header position (use Header Search)
  • Skipping metadata rows (set Start Row)

Tips & Troubleshooting

  • Use Header Search for variable report layouts.
  • Enable Trim Whitespace to avoid subtle mismatches.
  • Start with small maxRows during testing.
  • Use Column Mapping to standardize names.
  • Enable Skip Empty Rows to clean output.
  • Preserve Types (Excel) to maintain numeric/boolean fidelity.
  • Test regex patterns before production.

Version History

Version 1.0 - Initial unified node release

  • CSV and Excel file support
  • Advanced filtering
  • Column selection and mapping
  • Header search capability
  • Excel range modes (auto/manual/headerSearch)