Job configuration specifications
Train a Smart Answers cold start model
word_custom
or bpe_custom
.
This trains Word2vec on the data and fields specified in Training collection and Field which contains the content documents. It might be useful in cases when your content includes unusual or domain-specific vocabulary.
When you use the pre-trained embeddings, the log shows the percentage of processed vocabulary words. If this value is high, then try using custom embeddings.
During the training job analyzes the content data to select weights for each of the words. The result model performs the weighted average of word embeddings to obtain final single dense vector for the content.
random_*
dynamic field defined in its managed-schema.xml
. This field is required for sampling the data. If it is not present, add the following entry to the managed-schema.xml
alongside other dynamic fields <dynamicField name="random_*" type="random"/>
and <fieldType class=“solr.RandomSortField” indexed=“true” name=“random”/> alongside other field types.