Parsers provide fine-grained configuration for inbound data. You configure parsers with stages, much like index pipelines and query pipelines. Parsers can include conditional parsing and nested parsing. You can configure them through the Lucidworks Search UI or the Parsers API. Connectors receive the inbound data, convert it into a byte stream, and send the byte stream to a parser’s configured parsing stages. The parser selects a parsing stage to handle the stream, which parses the data and produces documents that are sent to the index pipeline. Each parsing stage evaluates whether the inbound stream matches the stage’s default media types or filename extensions. The first stage that finds a match processes the data and can output one or both of the following:Documentation Index
Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
Use this file to discover all available pages before exploring further.
- Zero or more pipeline documents for consumption by the index pipeline
- Zero or more new input streams for re-parsing
This recursive approach is useful for containers (for example,zipandtarfiles). The output of the container parsing can be another container or a stream of uncompressed content that requires its own parsing.
| Field | Description |
|---|---|
| Document ID Source Field | Field in the source file that contains the document ID |
| Maximum Parser Recursion Depth | Maximum number of times the parser may recurse over the file, before proceeding to the next parser. This is useful for files with hierarchical structures (for example, zip and tar files). |
| Enable automatic media type detection | Whether to automatically detect the media type of the source files. If disabled, the parser uses the media type application/octet-stream. |