> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Machine Learning Models in Fusion

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/4/fusion-ai/concepts/machine-learning/machine-learning-models

[mintlify link]: https://doc.lucidworks.com/docs/4/fusion-ai/concepts/machine-learning/machine-learning-models

[old doc.lw link]: https://doc.lucidworks.com/fusion/5.9/492

Fusion provides the following tools required for the model training process:

* Solr can easily store all your training data.
* Spark jobs perform the iterative machine learning training tasks.
* Fusion’s blob store makes the final model available for processing new data.

<LwTemplate />

## Training Models

<Note>
  The approach for training models explained in this section still works in Fusion 4.2. An alternative approach introduced in Fusion 3.1 lets you create model-training jobs in the Fusion UI. See [Machine Learning in Lucidworks Fusion](https://lucidworks.com/post/machine-learning-in-lucidworks-fusion/) for more information.
</Note>

An example Scala script to train an SVM-based sentiment classifier for tweets is provided in the [`spark-solr`](https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/example/ml/) repository.

The following diagram depicts this process:

<img src="https://mintcdn.com/lucidworks/9iE2X4O8aa8U8XL3/assets/images/common/Supervised_MachineLearning_ModelTraining_Workflow.png?fit=max&auto=format&n=9iE2X4O8aa8U8XL3&q=85&s=674d55d77d251e432e3e13c7aef00f92" alt="Model Training Processes" width="2626" height="1278" data-path="assets/images/common/Supervised_MachineLearning_ModelTraining_Workflow.png" />

## Model Prediction

Fusion’s blob store requires all stored objects have a unique ID.
Once the model is stored in the Fusion blobstore, it is available to Fusion’s index and query
Machine Learning pipeline stages, which use the model to make predictions for new data in
pipeline documents and queries.
The following diagram shows how this process works:

<img src="https://mintcdn.com/lucidworks/9iE2X4O8aa8U8XL3/assets/images/common/Supervised_MachineLearning_ModelServing_Workflow.png?fit=max&auto=format&n=9iE2X4O8aa8U8XL3&q=85&s=ac4a310d809786f8b0e168c7953b3eea" alt="Model Serving Processes" width="2626" height="1372" data-path="assets/images/common/Supervised_MachineLearning_ModelServing_Workflow.png" />

## Model Checking

To test the goodness of your model in Fusion,
first create either a document index pipeline
or a query processing pipeline which contains a Machine Learning stage that uses your
model to make predictions on your data,
and then send a document or query through that pipeline
pipeline which contains data for which you know what the predicted value should be.
For example, given a trained sentiment classifier and an index stage configured to use it,
the following document should be classified as a highly positive tweet, with a value of (close to) 1.0 in the "sentiment\_d" field:

```json wrap  theme={"dark"}
{ "id":"tweets-2",
  "fields": [
    { "name": "tweet_txt",
      "value": "I am super excited that spring is finally here, yay! #happy" }
  ]
}
```

## Metadata file spark-mllib.json

The file `spark-mllib.json` contains metadata about the model implementation. In particular, how the model derives feature vectors from a document or query.

The JSON object has the following attributes:

* `id`. A string label that is used as a unique ID for the Fusion blobstore, for example, `tweets_sentiment_svm`.
* `modelClassName`. The name of the `spark-mllib` class or the custom Java class that implements the `com.lucidworks.spark.ml.MLModel` interface.
* `featureFields`. A list of one or more field names.
* `vectorizer`. Specifies the processing required to derive a vector of features from the contents of the document fields listed in the `featureFields` entry.

The following example shows the `spark-mllib.json` file for the model with id `tweets_sentiment_svm`:

```json wrap  theme={"dark"}
{
  "id": "tweets_sentiment_svm",
  "modelClassName": "org.apache.spark.mllib.classification.SVMModel",
  "featureFields": [
    "tweet_txt"
  ],
  "vectorizer": [
    {
      "lucene-analyzer": {
        "analyzers": [
          {
            "name": "std_tok_lower",
            "tokenizer": {
              "type": "standard"
            },
            "filters": [
              {
                "type": "lowercase"
              }
            ]
          }
        ],
        "fields": [
          {
            "regex": ".+",
            "analyzer": "std_tok_lower"
          }
        ]
      }
    },
    {
      "hashingTF": {
        "numFeatures": "1000000"
      }
    }
  ]
}
```

The `vectorizer` consists of two steps: a `lucene-analyzer` step followed by a `hashingTF` step. The `lucene-analyzer` step can use any Lucene analyzer to perform text analysis.

Other available vectorizer operations include the MLlib normalizer, the standard scaler, and the ChiSq selector. To see how to use the standard scaler, see the examples in the [`spark-solr`](https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/example/ml/) repository.
