# Settings

Sometimes the AI needs kicking in a subtle way, by either restricting its choices or expanding them, or simply ordering it be more spontaneous. That's where generation settings come in. Which ones a service offers can differ, but this page serves to explain the most common options you have available to fiddle with when writing your way out isn't working.

### Randomness[edit | edit source]

Randomness, **better known as Temperature**, is a sampling method in itself, and concerns dividing logits by a set temperature value before running each arbitrary real number through a SoftMax function that converts those values into probability percentages, thereby obtaining sampling probabilities.

Lower randomness values increase the confidence the model has in its highest probability choices, while temperature values larger than one decrease that confidence in relation to the lower probability choices. In other words, the distribution of percentage values changes depending on your temperature value.

So in practice, increasing temperature makes more out there words appear in outputs, which can lead to more verbose and interesting language but can also result in a 'semantic drift' where the AI gets caught up on these unconventional terms and goes off its own merry way, or decides to go an unlikely route in a moment of uncertainty.

Lower temperatures make the model increasingly confident in its top choices, while temperatures greater than 1 decrease confidence. 0 temperature is equivalent to argmax/max likelihood, while infinite temperature corresponds to a uniform sampling.

Anecdotally, increasing temperature has been said to speed up the pace of a story and introduce more imaginative outputs, so in moments of monotony it may be wise to temporarily bump it up to keep things exciting. This is effectively what Randomness will be in your stories, a way to pick up the pace of the story or introduce avant-garde language.

### Output Length[edit | edit source]

The number of tokens you want the AI to output each time. In NovelAI, the number does not represent tokens but rather characters, and is calculated by multiplying the number of tokens by four, so dividing 160 by 4 yields 40 tokens. Set this to the amount you want the AI to come up with each time. There isn't really an optimal amount to go for (some say max, some say 100 tokens, some say 60, etc.), but there is something to consider.

**The longer your output length, the further the AI gets from your injected text.**

This isn't a big deal for Memory and other elements that are inserted at the back of context, but for injections closer to the front of context such as Author's Note, there can be a difference between three 40-token generation and one 150-token generation. Is this difference notable enough to influence your turning of the dial? Probably not.

### Top-K Sampling[edit | edit source]

Top-K Sampling was created because as we all know relying only on temperature to smooth things out really doesn't help as much as it frankly should.

The way this one works is it sorts all potential tokens by probability and starts to remove each token from least likely to most likely, stopping only when it reaches the kth token, kth here representing whatever you set the option to. After it shortens it to said k amount of tokens, the probability is redistributed accordingly. Effectively, Top-k tries to condense the amount of possible tokens to choose from by filtering to remove the really unlikely useless tokens.

Top-K is okay, but its biggest downside is that it doesn't change at all for times when there are a range of equally valid choices. Whether your sentence is *"I took my dog out for a ____"* or *"Today, I ate a _____"*, Top-K applies the exact same strategy.

### Nucleus Sampling[edit | edit source]

Nucleus Sampling (also known as Top-p Sampling) tackles the issue laid out with Top-k sampling, and it does this by working with a cumulative probability.

Whatever value you set as Nucleus Sampling is the probability percentage target that it wants to reach; it adds the probabilities of tokens and compares the results to find the smallest amount of tokens that exceeds your chosen probability value. That set then becomes the words to pick from.

Having set p=0.92, Top-p sampling picks the minimum number of words to exceed together 92% of the probability mass. In the first example, this included the 9 most likely words, whereas it only has to pick the top 3 words in the second example to exceed 92%.

This means that Nucleus Sampling accounts for times when the next likely tokens are obvious (the token set may be smaller) and times when there are many equally valid tokens (the token set may be larger)

### Tail-Free Sampling[edit | edit source]

Unlike Nucleus, Tail-Free Sampling (TFS) focuses on removing bad tokens rather than selecting good ones. It does this due to a flaw in Nucleus sampling: it does not attempt to provide a diverse range of viable tokens, which can lead to repetition.

### Repetition Penalty[edit | edit source]

Repeats stuff less.

### Frequency & Presence Penalty[edit | edit source]

The Frequency and Presence penalties are hidden experimental settings only available by exporting and editing a preset. They're based on OpenAI's repetition penalty implementation, and are arguably superior to the rep-pen settings exposed in the UI. They work by modifying the logits of a given token with an additive contribution as follows (assuming NAI's implementation follows OAI's):

mu[x] -> mu[x] - c[x] * frequency - float(c[x] > 0) * presence

Where:

- mu[x] is the logits of a token
- c[x] is how often the token appears in context
- float(c[x] > 0) is 1 if c[x] > 0 and 0 otherwise
- frequency and presence are the coefficents for each per your preset

Presence penalty modifies the probability of all tokens that have appeared at least once by a set amount, and frequency penalty modifies probability proportionally to how many times the token appears in context. Frequency penalty is the only form of rep-pen that can apply itself more than once, making it strong for preventing loops, punctuation repetition, and structural repetition. Note that higher values can fuck with punctuation really badly and seriously decrease prose quality.

### Token Probability Viewer[edit | edit source]

*See Token Probabilities.*

## Community-Made Settings[edit | edit source]

A collection of NovelAI presets exists here.