Phrase Biasing

Link to the original rentry: https://rentry.org/biases#but-what-are-phrase-biases

The NovelAI implementation of token biasing. NovelAI's unique Phrase Biasing, is far more powerful than the logit biasing we implemented, tested, and open-sourced earlier but found unsatisfactory.

For example, we can bias positively for, ,  ,   and make sure that Sigurd sticks to a first-person narrative.

''Another example would be to bias for various classes in an RPG such as a,  , or  , or against fantasy races and creatures. Alpha test users have reported being able to get a very distinct experience without the use of AI Modules or Lorebooks with Phrase Biasing alone!''

''Possible bias values are decimal numbers ranging from -2.0 (for negative bias) to 0 (for no bias) to 2.0 (for positive bias). Please note that the strength of biasing is logarithmic, not linear. For example, -2.0 is much much stronger than -1.0. We've observed that bias values in the range of -0.4 to 0.4 are often enough to have a good effect.'' If the only thing you want is useful biases others have made, then head to our bias collection. Otherwise, continue on for a primer about how this shit even works.

The Menu


The phrase bias menu is placed on the right sidebar; underneath the context buttons in the Advanced tab.

Phrases
A "phrase" is a token or sequence of tokens you can modify the bias value of to modify its chance of showing up.

Entering words / sentences works like banned tokens, you type however long you want it to be, and pressing enter confirms it as a "phrase" which can later be edited and removed by clicking on them. Similar to banned tokens, encasing your output in square brackets allows you to input token values, but encasing them in curly brackets allows you to enter them as is without any modification by the frontend. Unlike banned tokens however, phrase bias will allow you to have duplicates of the same exact phrase.

What you enter by default is case-sensitive and contains a space. and  will be treated as two unique phrases, and the tokens they point to aren't ones used to start a new paragraph. If you don't want the space, use curly brackets.

Bias
Increasing and decreasing the Bias value understandably increases or decreases the chance of the collection of phrases showing up in your outputs. As said by the devs, the value works logarithmically, which effectively means that the most drastic changes can be felt by only slightly nudging values.

Whatever value you think you need to set your phrase biases to in order to get more words to show up is probably too high. Instead, try adjusting the value by 0.1 increments and then getting more granular as you find the sweet spot. When compared standard logit biasing, phrase biasing is also extremely context-aware. So far, I haven't needed to nudge things past 0.3 positive or negative, but this is all down to personal preference, and it's brain-dead easy to change if the AI isn't bringing up enough reptilian tails and scales.

There really isn't a good rule of thumb here about what value to set what kind of thing, and this is entirely up to your own personal preference. Some tokens you want to boost only need a little nudge while others need a bigger push, and it's the same when trying to cut back.

Experiment. The dial is there, so fiddle with it. You'll figure out what you like soon enough.

Bias Sets
Phrases are categorized into individual sets which you control the bias value applied to. For the sake of clarity, we'll refer to them as bias sets.

With the plus button, you can create a new bias set ; all enabled bias sets are active at the same time as all other enabled sets. Similarly, if you have two sets that both contain the same word, the values assigned to the word stack on top of each other.

Like a majority of other options NAI has to offer, you can export your entire collection of bias sets in a story to import into another story, appending those sets at the end of your list.

Ensure Completion After Start
The last two options, Ensure Completion After Start and Unbias When Generated are self-explanatory, but I have a few thoughts on them based on preliminary testing.

Ensure Completion After Start means that whenever the AI generates the first token of a phrase, the subsequent tokens that comprise the rest of the phrase are high enough to ensure the phrase is completed. This is a very extreme option based on my experiments, as it can cause the AI to bring up the phrase at incongruous times, so why would you ever enable it?

I see two possibilities: either normal bias operation isn't good enough for you (perhaps you're attempting to use an extremely uncommon name or word), or you don't want any other tokens to appear.

Consider my earlier bias set; increasing the bias to 0.5 makes my desired tokens often appear, but also has unintended consequences.

Without the option enabled, I didn't just get an increased chance for blippo, but all words that share the same starting tokens like blouse.

The same can be said for yingo, leading to yum and its variations showing up instead of my desired word.

So keep that in mind; '''increasing bias can increase the likelihood of a lot more tokens than just the ones you've entered. Clever utilization of stacking values can help you overcome this.'''

Unbias When Generated
This option disables the bias applied to a token if it appears only once during generation. This is certainly more useful in smaller generation sizes where what little output you get is more useful when it isn't the same couple tokens being repeated ad nauseam, but it can also serve to curtail more dramatic biasing efforts where you definitely want words to show up each generation but don't want them to show up more than once.

It's now enabled by default for good reason; without it, you'd be opening yourself up to egregious repetition that makes figuring out the right positive bias value quite annoying. Only disable it if that's what you want, like if you want to decrease the chance of full stops and paragraph breaks, you'd turn this option off then.

Miscellaneous

 * If you're scared of a particularly high set swarming over a majority of your outputs One thing you could do is try and increase your Output Length while Unbias When Generated is enabled, to space out guaranteed words in the AI's text.
 * A cool thing you can do is make a functional 'whitelist', enabling Ensure Completion on a set with 0.0 Bias, and entering in phrases to make the AI only choose from those completions when it generates the first tokens.
 * Biases are handled in a first in first out order, so keep that in mind if you have similar starting tokens; the AI will most likely prefer the oldest one to generate—unless the situation really wouldn't work with it.