> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Spark Operations

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/5/fusion/intro/fusion-stack/spark/overview

[mintlify link]: https://doc.lucidworks.com/docs/5/fusion/intro/fusion-stack/spark/overview

[old doc.lw link]: https://doc.lucidworks.com/fusion/5.9/189

[Apache Spark](http://spark.apache.org/) is an open source cluster-computing framework that serves as a fast and general execution engine for large-scale data processing jobs that can be decomposed into stepwise tasks, which are distributed across a cluster of networked computers.

<LwTemplate />

Spark improves on previous MapReduce implementations by using resilient distributed datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner.

These topics explain Spark administration concepts in Fusion 5:

* [Spark Job Drivers](/docs/5/fusion/intro/fusion-stack/spark/spark-job-drivers)
* [Spark Administration in Kubernetes](/docs/5/fusion/operations/survival-guide/spark-kubernetes-overview)

Spark operations include:

1. Audit all [Spark jobs](/docs/5/fusion/operations/jobs-and-scheduling/spark-jobs) for natural key support.
2. Audit [SQL Aggregation](/docs/5/fusion/reference/config-ref/jobs/aggregations/sql-aggregation) jobs for natural key usage.
3. When looking at SQL for the [BPR Recommender job](/docs/5/fusion/reference/config-ref/jobs/bpr-recommender), audit generated aggregation SQL to ensure that it’s using a natural key projection.
4. Support partitioning in all Spark jobs in accordance with config options.
5. Support external data sources for all jobs (Spark, NLP, Clustering, Recommender), including external Spark source support for NLP.
