Updates
v2 \
With shorthand V2 the system moves from using one character to represent an effect to using two, this is to make the letter combination make more sense and open up more "slots" for future effects.
This v2 only work in streams that uses Bikubot v0.4.301 or later. If you want v1 you can find it here:
Shorthand v1
Down below you can also find the 6 new effects added this time around. These effects are being added on in a second pass on the TTS audio gotten from AWS Polly.
-
_Modifications_
-
_Echo_
-
_Megaphone_
-
_Minified_
-
_Muffler_
-
_Reverb_
-
_Robot_
---
#
Shorthand SSML for Bikubot
-
Shorthand SSML for Bikubot
-
_What is this_
-
_How it works_
-
_Short Notes_
-
_Modifications_
-
_Break_
-
_Emphasis_
-
_Echo_
-
_Expletive/Beep_
-
_IPA (International Phonetic Alphabet)_
-
_Language_
-
_Max Duration_
-
_Megaphone_
-
_Minified_
-
_Muffler_
-
_Pitch_
-
_Soft_
-
_Rate_
-
_Reverb_
-
_Robot_
-
_Timbre_
-
_Volume_
-
_Whisper_
-
Special Effects
-
_Breath_
-
_Tones_
---
_What is this_
This is a custom and shortend way to control the TTS voices of Bikubot, this uses
AWS Polly SSML tags to control how the voice sounds, but shortend and simplfies the tags to make it easier and shorter to use.
_How it works_
Any change to how something is spoken start with
_#_ followed by the modifications you wanna do to the voice. These modifications are represented by a two-letter code [as an example
_pi_ for pitch] and for some modification the addition of numbers are needed to represent the scale of the modification. Finally the spoken word you want the modification to apply to is encapsulated by
_[ and ]_. Because of this the characters
[ and
] are
reserved and if used within a voice modification it needs to be a matching pair. \
an example would be the SSML
_<prosody pitch="+50%" rate="200%">This is a test</prosody>_ would in shorthand be
_#pi150ra200[this is a test]_. Note that it's not a one to one for some things, as pitch in Normal SSML goes between -30 and +50, but shorthand only works with positive numbers so a conversion is done, where instead of starting at 0 the shorthand starts at 100 for pitch. \
You can also mix any modifications, as an example if you wanted to add a whisper to the above example the shorthand would be:
_#whpi150ra200[this is a test]_. The order of the modification codes does not matter. So you could do it like
_#pi150whra200[this is a test]_ and it would work the same. \
But if you would try to do something like
_#whra20ra200[this is a test],_ that is to have the same modification more than once in the same
_tag_ it will only take the latest modification it sees in the tag so in the case it would seen the same as
_#whra200[this is a test]_, the ra20 will be thrown away. \
The shorthand also support nested tags, so you could do something like
_#pi150[this is a #wh[test]]._ All modification is also case insensitive so
_#PI150LA(Sv-Se)[test]_ is the same as
_#pi150la(sv-se)[test]_. \
The bot also does its best to fix any issues, such as if a value is too high it will set it to highest possible for that modification. \
The possible modifications and their values can be found next.
---
_Short Notes_
* A voice modifications starts with
# followed by one or more modification found below, then ending with the speech you want modified encapsulated in
[ and
].
* The characters
[ and
] are reserved characters and if used, need to be used in pairs when used outside their intended use case (marking what to modifiy).
* You can do nested modifications.
*
Example:
*
#pi150[this is a nested pitch #wh[whisper test]]
*
#pi150[this is #wh[deeply #ra120so[nested and #ti120[going deeper], and] now] back up]
*
#vo11[#wh[testing #so[softly] whispering] with a bit higher volume, #ti50[ending with some timbre]]
* You can add more then one modification per voice modificiation, the order does not matter.
*
Example:
*
#pi150wh[this is a modifed pitch with whipser]
*
#whsoti50la(sv-se)[this soft and whispering swedish language voice with modified timbre]
*
#br.5ti50pi150ra180[This starts with a 0.5s break and modified pitch, rate and timbre]
* The modification part is case insensative.
* Any modification value outside it's min or max range will be set to its min or max (whatever is closest).
* Any modification value that is not valid will be set to a normalized default value.
* Any characters that does not represent a modification will be ignored if part of the modification part.
* A Faulty voice modification, like a space in the modification part or not correctly encapsulated will be read as normal.
---
_Modifications_
---
$3
Break is represented by the code
_br_ and supports either a following numeric value or
_+ , ++ , - , --_. The SSML equivalence is the
_<break time=””>_ tag. The break happens before any given text, if there is any in the encapsulating
_[]_
*
Effect: Creates a break in the speech at the given point of the tag for the given amount of time in seconds..
*
Characters: \
These represent the same preset values that normal SSML has.
*
++ = x-high
*
+ = high
*
- = low
*
-- = x-low
*
Numeric:
*
default: 1.0
*
max: 10.0
*
min: 0.0
*
Example:
*
Characters: \
_#br+[]_ is equal to
_<break strength=”strong” />_
*
Numeric: \
_#br1.2[A test]_ is equal to
_<break strength=”1200ms” />A test_ \
_#br.5[]_ is equal to
_<break strength=”500ms” />_
---
$3
Emphasis is represented by the code
_em_ and needs a following
_- , + , ++_. The SSML equivalence is the
_<emphasis level="modeerate">_ tag.
*
Effect: Tries to (de)emphasis the word/sentence.
*
Characters: \
These represent the same preset values that normal SSML has.
*
++ = strong
*
+ = moderate
*
- = reduced
*
Example: \
_#em++[A test]_ is equal to
_<emphasis level="strong">A test</say-as>_ \
_#em-[A test]_ is equal to
_<emphasis level="reduced">A test</say-as>_
---
$3
Echo is a secondary effect, meaning its beeing added on after the TTS is generated.
Echo is represented by the code
_ec_ and needs a following number between 1 and 6 for the strength of the echo effect. There is no SSML equivalence.
*
Effect: Adds an echo effect at the chosen level.
*
Numeric:
*
default: 2
*
max: 6
*
min: 1
*
Example: \
_#ec4[A test]_
---
$3
Expletive/beep is represented by the code
_ex_ and does not need any additional data. The SSML equivalence is the
_<say-as interpret-as="expletive">_ tag.
*
Effect: Beeps out the content.
*
Example: \
_#ex[A test]_ is equal to
_<say-as interpret-as="expletive">A test</say-as>_
---
$3
IPA is represented by the code
_ip_ and followed by encapsulated in () the phonetic symbols for pronunciation. The SSML equivalence is the
_<phoneme alphabet="ipa" ph=”">_ tag.
Effect: Changes how the word(s) encapsulated in _[]_ are spoken.*
*
Example: \
_#ip(pɪˈkɑːn)[A test]_ is equal to
_<phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>_
---
$3
Language is represented by the code
_la_ and followed by encapsulated in
_()_ the language code for the language you want to use. The SSML equivalence is the
_<lang xml:lang="fr-FR">_ tag.
*
Effect: Changes what language the voice will use to try to speak the words.
*
Language codes:
Language
|
Code
|
Language
|
Code
|
Language
|
Code
|
Arabic
|
arb
|
Arabic (gulf)
|
ar-ae
|
Catalan
|
ca-es
|
Chinese (Cantonese)
|
yue-cn
|
Chinese (Mandarin)
|
cmn-cn
|
Danish
|
da-dk
|
Dutch
|
nl-nl
|
English (Australien)
|
en-au
|
English (British)
|
en-gb
|
English (Indian)
|
en-in
|
English (New Zealand)
|
en-nz
|
English (South African)
|
en-za
|
English (US)
|
en-us
|
English (Welsh)
|
en-gb-wls
|
Finnish
|
fi-fi
|
French
|
fr-fr
|
French (Canadian)
|
fr-ca
|
Hindi
|
hi-in
|
German
|
de-de
|
German (Austrian)
|
de-at
|
Icelandic
|
is-is
|
Italian
|
it-it
|
Japanese
|
ja-jp
|
Korean
|
ko-kr
|
Norwegian
|
nb-no
|
Polish
|
pl-pl
|
Portuguese (Brazilian)
|
pt-br
|
Portuguese (European)
|
pt-pt
|
Romanian
|
ro-ro
|
Russian
|
ru-ru
|
Spanish (European)
|
es-es
|
Spanish (Mexican)
|
es-mx
|
Spanish (US)
|
es-us
|
Swedish
|
sv-se
|
Turkish
|
tr-tr
|
Welsh
|
cy-gb
|
*
Example:
*
Characters: \
_#la(ja-jp)[A test]_ is equal to
_<lang xml:lang="ja-JP">A test</lang>_
*
Numeric: \
_#la(en-us)[A test]_ is equal to
_<lang xml:lang="en-US">A test</lang>_
---
$3
Megaphone is a secondary effect, meaning its beeing added on after the TTS is generated.
Megaphone is represented by the code
_me_ and needs a following numeric selector (1–3). There is no SSML equivalence.
*
Effect: Applies a megaphone effect at the chosen level.
*
Numeric:
*
default: 1
*
max: 2
*
min: 1
*
Example: \
_#me2[A test]_
---
$3
Max duration is represented by the code
_du_ and needs a following numeric value. The SSML equivalance is the
_<prosody amazon:max-duration="">_ tag. There is limits on how fast the speech can be speed up, and if it already fits within the duration no changes are made.
*
Effect: Tries to speed up the speech so it fits within the given time.
*
Numeric:
*
default: 1.0
*
max: 60.0
*
min: 0.0
*
Example
_#du5.3[A test]_ is equal to
_<prosody amazon:max-duration="5300ms">A test</prosody>_ /
_#du.5[A test]_ is equal to
_<prosody amazon:max-duration="500ms">A test</prosody>_ /
---
$3
Minified is a secondary effect, meaning its beeing added on after the TTS is generated.
Minified is represented by the code
_mi_. There is no SSML equivalence.
*
Effect: Applies a "minified" effect. Where it sounds like the speech is coming from something small.
*
Numeric:
* Fixed level: 1
*
Example: \
_#mi[A test]_
---
$3
Muffler is a secondary effect, meaning its beeing added on after the TTS is generated.
Muffler is represented by the code
_mu_ and needs a following numeric strength selector (1–3). There is no SSML equivalence.
*
Effect: Applies a muffling effect at the chosen level.
*
Numeric:
*
default: 1
*
max: 3
*
min: 1
*
Example: \
_#mu2[A test]_
---
$3
Pitch is represented by the code
_pi_ and supports either a following numeric value or
_+ , ++ , - , --_. The SSML equivalence is the
_<prosody pitch=””>_ tag.
*
Effect: Changes the pitch at which the spoken words are spoken at.
*
Characters: \
These represent the same preset values that normal SSML has.
*
++ = x-high
*
+ = high
*
- = low
*
-- = x-low
*
Numeric:
*
default: 100
*
max: 150
*
min: 70
*
Example:
*
Characters: \
_#pi++[A test]_ is equal to
_<prosody pitch=”x-high”>A test</prosody>_
*
Numeric: \
_#pi150[A test]_ is equal to
_<prosody pitch=”50%”>A test</prosody>_
---
$3
Soft speech is represented by the code
_so_ and does not need any additional data. The SSML equivalence is the
_<amazon:effect phonation="soft"">_ tag.
*
Effect: Makes the speech being spoken sound softer.
*
Example: \
_#so[A test]_ is equal to
_<amazon:effect phonation="soft""A test</amazon:effect>_
---
$3
Rate is represented by the code
_ra_ and supports either a following numeric value or
_+ , ++ , - , --_. The SSML equivalence is the
_<prosody rate=””>_ tag.
*
Effect: Changes the speed at which the words are spoken.
*
Characters: \
These represent the same preset values that normal SSML has.
*
++ = x-fast
*
+ = fast
*
- = slow
*
-- = x-slow
*
Numeric:
*
default: 100
*
max: 2000
*
min: 20
*
Example:
*
Characters: \
_#ra--[A test]_ is equal to
_<prosody rate=”x-slow”>A test</prosody>_
*
Numeric: \
_#ra150[A test]_ is equal to
_<prosody rate=”150%”>A test</prosody>_
---
$3
Reverb is a secondary effect, meaning its beeing added on after the TTS is generated.
Reverb is represented by the code
_re_ and needs a following numeric strength selector (1–3). There is no SSML equivalence.
*
Effect: Adds reverb at the chosen level.
*
Numeric:
*
default: 1
*
max: 3
*
min: 1
*
Example: \
_#re3[A test]_
---
$3
Robot is a secondary effect, meaning its beeing added on after the TTS is generated.
Robot is represented by the code
_ro_ and needs a following numeric selector (1–3). There is no SSML equivalence.
*
Effect: Applies a robotic effect at the chosen level.
*
Numeric:
*
default: 1
*
max: 3
*
min: 1
*
Example: \
_#ro2[A test]_
---
$3
Timbre is represented by the code
_ti_ and supports either a following numeric value or
_+ , ++ , - , --._ The SSML equivalence is the
_<amazon:effect vocal-tract-length="">_ tag.
*
Effect: Changes the timbre of voice.
*
Characters:
*
++ = 200%
*
+ = 150%
*
- = 75%
*
–- = 50%
*
Numeric:
*
default: 100
*
max: 200
*
min: 50
*
Example:
*
Characters: \
_#ti--[A test]_ is equal to
_<amazon:effect vocal-tract-length="50%">A test</amazon:effect>_
*
Numeric: \
_#ti50[A test]_ is equal to
_<amazon:effect vocal-tract-length="50%">A test</amazon:effect>_
---
$3
Volume is represented by the code
_vo_ and supports either a following numeric value or
_+ , ++ , - , --_. The SSML equivalence is the
_<prosody volume=””>_ tag.
*
Effect: Changes the volume of the speech.
*
Characters: \
These represent the same preset values that normal SSML has.
*
++ = x-loud
*
+ = loud
*
- = soft
*
-- = x-soft
*
Numeric:
*
default: 10
*
max: 14
*
min: 4
*
Example:
*
Characters: \
_#vo+[A test]_ is equal to
_<prosody volume=”loud”>A test</prosody>_
*Numeric: \
_#vo4[A test]_ is equal to
_<prosody rate=”-6db”>A test</prosody>_
---
$3
Is represented by the code
_wh_ and does not need any additional data. The SSML equivalence is the
_<amazon:effect name="whispered">_ tag.
*
Effect: Makes the spoken words be spoken in a whispering voice. \
*
Example: \
_#wh[A test]_ is equal to
_<amazon:effect name="whispered">A test</amazon:effect>_
---
Special Effects
There are a few special effects that the shorthand supports. These sounds are represented by the effect name encapsulated by
_--_ , like
_--effectname--_ . Some of these will be affected by modifications as they are created with SSML and TTS, if so it will be noted.
_ _Plans for the future is to allow streamers to add their own sounds to this system. These are all case insensitive.
---
$3
These are created using the SSML
_<amazon:breath>_ tag. All breath uses the volume=”x-loud” for a chance to be heard.
*
_--BXL--_ = <amazon:breath duration="x-long" volume="x-loud"/>
*
_--BL--_ = <amazon:breath duration="long" volume=”x-loud"/>
*
_--B--_ = <amazon:breath duration="medium" volume=”loud"/>
*
_--BS--_ = <amazon:breath duration="short" volume=”loud"/>
*
_--BXS--_ = <amazon:breath duration=”x-short" volume=”loud"/>
---
$3
The following cheat sheet uses the old one character system, if you use it make sure to translate it to the two character system. So #expixx instead of #epxx.
This is a cheat sheet on how to create tone sounding sounds by using the Expletive/Beep tag, that was created by a community memeber
Nowrench.
!
TTS-Melody-Guide
---