> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Sources

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/4/app-studio/concepts/overview/data-sources

[old doc.lw link]: https//doc.lucidworks.com/app-studio/4.2/3117

[mintlify link]: https://doc.lucidworks.com/docs/4/app-studio/concepts/overview/data-sources

Appkit can obtain and present information from multiple sources simultaneously, and seamlessly integrate with all common search engines, data warehouses, web service endpoints. The platform abstraction covers both the storage and sides, meaning that Appkit implements for each platform the necessary protocols to save and index content. A number of platform adapters are available out-of-the-box with Appkit, and [adding new ones](/docs/4/app-studio/concepts/extending-appkit/developing-platforms) is a straightforward exercise.

Appkit is not limited to presenting information from search engines. It also supports data warehouses, web service endpoints, and other information sources that expose an API.

<LwTemplate />

## Platform adapters

The layer that relays Appkit specific instructions to the underlying data provider is referred to as Platform Adapters. This is responsible for translating for example, an Appkit query to a corresponding command the underlying platform will understand; and conversely an engine specific response to a generic Appkit response.

The platform abstraction covers both search and storage sides, meaning that Appkit will implement for each platform the necessary protocols to save or index content for any platform.

A number of platform adapters are available out-of-the-box with Appkit, and adding new ones is a fairly trivial exercise.

### Search engines

* [Fusion](/docs/4/app-studio/reference/search-platforms/fusion/overview)
* [Solr](/docs/4/app-studio/reference/search-platforms/solr/overview)
* [SolrCloud](/docs/4/app-studio/reference/search-platforms/solr/solrcloud)
* [Elasticsearch](/docs/4/app-studio/reference/search-platforms/elasticsearch)

### Other data sources

* [Wolfram|Alpha](/docs/4/app-studio/reference/search-platforms/wolfram-alpha/overview)
* [Wikipedia](/docs/4/app-studio/reference/search-platforms/wikipedia)
* [Pipl](/docs/4/app-studio/reference/search-platforms/pipl)
* [YouTube](/docs/4/app-studio/reference/search-platforms/youtube)
* [Twitter](/docs/4/app-studio/reference/search-platforms/twitter)
* [Google Custom Search](/docs/4/app-studio/reference/search-platforms/google-custom-search)

## Store, search, load, delete

Appkit (assuming storage backend capabilities) provides support for storing or indexing, deleting, fetching individual records, and searching the backing data store.

## Task support

Appkit supports long-running, managed objects. This is useful for operations that can take longer to complete such as Map/Reduce processes. In that case Appkit will manage the lifecycle, initiate the task, track and report progress, and finally present the response. After a response is received, it can be rendered using Appkit user interface components, or even passed to another platform for persistence which might give you low-latency searching and faceting.

<Accordion title="Using the Appkit Tasks Module">
  {/* // Update how-to header information */}

  {/* // formatted */}

  The Task module lets you run numerous, time-consuming tasks (atomic units of work) behind the scenes, while continuing to use your application. While your tasks are running, you can monitor their status as well as stop, restart, and cancel them if needed.

  ## Setup

  ### Updating the pom.xml

  To add the Task module to a Maven project, add this dependency to your project’s `pom.xml` file:

  ```xml theme={"dark"}
  <dependency>
      <groupId>twigkit</groupId>
      <artifactId>twigkit.task</artifactId>
      <version>${project.parent.version}</version>
  </dependency>
  ```

  ### Configuring the task executor

  The Task module includes its own task executor that takes charge of managing your tasks. Under the hood, the executor uses a registry that can be configured in order to fine-tune how your tasks will be handled.

  To customize the configuration of the task executor, first create a file named `executor.conf` and place this in `src/main/resources/conf/services/task/`. Within that file, these parameters can be set:

  `max-tasks-in-memory`: the maximum number of tasks the registry can hold in memory at any one time. The default is 100.

  `max-tasks-on-disk`: the maximum number of tasks that can be persisted on disk. The default is 10000000.

  `disk-store-location`: the location for storing/persisting tasks. The default is `java.io.tmpdir` - a JVM system property that will change depending on your Operating System.

  `time-to-idle`: the amount of time in seconds a task is allowed to be left idle (from last access or modified date) before being removed from the registry. The default is 3600.

  `time-to-live`: the amount of time in seconds a task is allowed to exist (from its creation date) before being removed from the registry regardless of how often it is used. The default is effectively forever.

  `overflow-to-disk`: whether additional tasks will be stored on disk, if the in-memory registry is full. The default is true.

  `persist-to-disk`: whether tasks will be persisted to disk, for example, if the if in-memory registry is full or the application shuts down. The default is true.

  As noted, all of the above parameters already contain default values that will be used in case `executor.conf` cannot be found or one or other of the parameters is not defined.

  ### Writing your own tasks

  To submit a task to the task executor, the actual actions of the task (what it will do when it starts, stops, and is deleted) must first be defined. A task is written in Java and must follow a prescribed template. This template is as follows:

  ```java expandable theme={"dark"}
  @XmlRootElement
  public class MyTask extends Task {

      public MyTask() {
          this(null);
      }

      public MyTask(User owner) {
          super(owner);
      }

      public MyTask(User owner, Map<String, String> attributes) {
          super(owner, attributes);
      }

      @Override
      public void start() throws TaskException {
  {/*         // Put here what the task will do when started, for example, download a file */}
      }

      @Override
      public void start(Callback<?> callback) throws TaskException {
  {/*         // This is a work in progress; Do not use. */}
      }

      @Override
      public void stop() throws TaskException {
  {/*         // Put here what the task will do when stopped */}
      }

      @Override
      public void delete() throws TaskException {
  {/*         // Put here what the task will do when deleted */}
      }
  }
  ```

  As can be seen from the template, three methods are available that can be used to decide what the task will do when it is started (`start`), when it is stopped (`stop`), and when it is deleted (`delete`) from the registry. These methods are named when particular RESTful endpoints are hit as discussed below.

  ### Configuring your own tasks

  After you have written your task (see above), the next step is to create a configuration file for that task. This will let you submit the task to the Task Web Service and define specific properties of the task that will be available to the task when it is started, stopped etc.

  As with any other configuration, the task configuration must be stored in `resources/conf`, for example, `resources/conf/tasks/download.conf`. The simplest configuration for a task would be one with just the `name` parameter:

  ```yaml theme={"dark"}
  name: package1.package2.package3.MyTask
  ```

  Where `name` is the fully-qualified class name of the Java class you have written to define your `Task`.

  The configuration can also include a display name, as well as any other attributes that the task might need, for example:

  ```yaml theme={"dark"}
  display: Download
  destination: /downloads
  file-type: pdf
  zipped: true
  ```

  These additional attributes can be accessed within the task via `getAttributes()`.

  ## Usage

  ### Working with tasks via the Appkit Task RESTful endpoint

  The Task web service provides a number of RESTful endpoints that can be used to submit, query, and cancel your tasks. These endpoints are as follows:

  `GET /twigkit/api/tasks`: this will get a list of tasks owned by the current user.

  `GET /twigkit/api/tasks/{id}`: this will get any information on the task with the given ID, including status and any task-specific attributes.

  `GET /twigkit/api/tasks/{id}/status` this will get the status of the task with the given ID.

  `POST /twigkit/api/tasks/{task}`: this will submit a task with the given configuration task name to the task executor. The given task name is the configuration name for a task. For example, if the configuration for a particular task was stored in `resources/conf/tasks/download.conf`, the task name passed to the submit endpoint would be `tasks.download`.

  `POST /twigkit/api/tasks/{id}/stop`: this will stop the task with the given ID.

  `POST /twigkit/api/tasks/{id}/restart`: this will restart the task with the given ID

  `DELETE /twigkit/api/tasks/{id}`: this will cancel the task with the given ID. This will call `stop` and `delete` on the task as well as remove the task from the registry.

  ### Task persistence

  By default, while an application is running, tasks are periodically flushed to disk. For this purpose, in the `disk-store-location` (see above) a `task-cache.data` file is created to keep track of tasks until the application shuts down when an additional file,`task-cache.index`, is created to persist tasks. Next time the application is run these tasks will be available for continued use.

  To disable task persistence, set the configuration parameters `overflow-to-disk` and `persist-to-disk` to `false`.

  ### Task expiration

  By default, tasks will not be stored in the registry indefinitely. To customize how long tasks may be stored, two configuration options are available:

  * `time-to-live`

    This is the amount of time in seconds a task is allowed to exist before being removed from the registry regardless of how often it is used. The default is effectively forever. This is useful in case resources are limited and it is not clear whether tasks will be used after a finite amount of time.
  * `time-to-idle`

    This is the amount of time in seconds a task is allowed to be left idle before being removed from the registry. The default is one hour.

    <Check>  Whatever times are set for these values, they are never reset. This holds true even when the application is not running. If the application is shut down and enough time elapses to allow the task to expire, on restart, the task will no longer be available for use.</Check>

  To disable task expiration, set `time-to-live` and `time-to-idle` to 0.

  #### Deleting a task after expiration

  After a task has expired, it is not only deleted from the registry, but the `delete` method implemented within your task will be named. If there is any final action that you would like your task to perform before it is removed then you can implement that action in the tasks `delete` method.

  ### Task eviction

  By default, tasks are stored on disk if the capacity to store tasks in-memory is exceeded. In this case, if the number of tasks being stored exceeds the limit set by the configuration parameter `max-tasks-on-disk` then the least recently used task will be evicted from the registry. The default maximum number of tasks that can be stored on disk is 10000000.

  If instead you decide not to store tasks on disk (both `overflow-to-disk` and `persist-to-disk` are `false`) but would like to limit the number of tasks that can be stored in-memory, then the configuration parameter `max-tasks-in-memory` can be set. This will evict the least recently used task once capacity is reached.

  #### Deleting a task after eviction

  After a task has been evicted, it is not only deleted from the registry, but the `delete` method implemented within your task will be named. If there is any final action that you would like your task to perform before it is removed then you can implement that action in the tasks `delete` method.
</Accordion>

## Batch operation support

Appkit can cursor and retrieve entire indices by combining Task Support and our Batch Platform Wrapper which will automatically offset through any number of results from a platform and combine the responses.
