Guidelines and rules

General principles

In this section, I will explain the overarching principles behind ROR guidelines.

Attested Romance orthographic practices

The central idea behind ROR is to reintegrate the orthography of Romance minority languages through the implementation of orthographic conventions used in Portuguese, Spanish, Catalan, Occitan, French, Italian and, to a lesser extent, Romanian.

This is achieved by acknowledging what these orthographic conventions are and by understanding what their etymological role is within the system they are borrowed from. A further recommendation is to avoid letters and multigraphs (di-, tri- and quadrigraphs) that are not used in the languages of reference. The use of ⟨w⟩ and ⟨k⟩ is especially discouraged, unless finding another solution proves unworkable.

This idea is also valid for diacritics. For example, the use of double dots above a character as a diacritic (⟨ä, ë, ï, ö, ü⟩) should be reserved for use as a diaeresis rather than an umlaut, i.e. it should not indicate a change in vowel quality, unless finding another option is difficult.

Etymological criterion

The details of the so-called etymological criterion are better explained in §ROR prescriptions. The main idea is that the reflexes and evolution of certain Latin consonants and consonant clusters determine how certain sounds are represented. These newly established spelling conventions can then be extended to non-etymological contexts. Therefore, a ROR orthography ought not to be exclusively historical: etymology is just the method used to justify spelling choices in certain contexts so that they can be used non-etymologically—according to the restrictions that will be explained below.

I will give an example. Old Spanish spelt [ʃ] as ⟨x⟩ and [ʒ] as ⟨j⟩ or ⟨g⟩. Over the centuries, these sounds merged into the sound [x], and the distinction between them was lost. To reflect this merger, it was decided that ⟨j⟩ should be used for the sound [x] where ⟨x⟩ was used in the past (in contrast, the use of ⟨g⟩ remained the same, but that falls outside the scope of this example). In this case, the use of ⟨j⟩ was extended to non-etymological contexts—an example of how a historically motivated choice can develop into a general spelling rule, within the limits of ROR.

Reader-oriented approach and orthographic depth

Depending on the language’s phonological history, a reintegrationist orthography may turn out to be historical to various degrees, and therefore more or less difficult to write in. For example, compare the Catalan words cena “dinner” (an archaic term) and sena “number six in a game of dice, sice”. Both words are pronounced the same, but their initial consonant is spelt differently according to etymology. This makes it more difficult for the writer to predict or recall the exact spelling of a word.

Romance reintegrationism promotes facilitating the reader rather than the writer. That is, while a single phoneme may be represented by multiple graphemes, each grapheme should ideally correspond to only one phoneme. It is possible that this principle cannot be implemented fully, and that mostly depends on how complex the phonemic inventory of the language is. Sacrificing this principle might be needed to make the orthography less crowded with diacritics (see §Economy) and less cumbersome for both the reader and the writer.

Economy

An important rule of ROR is the avoidance of double marking: a particular element or feature cannot be indicated more than once. As an example, Bolognese Emilian [mʌnd] “world” is written månnd. The double ⟨n⟩ marks the preceding vowel as short, but [ʌ] ⟨å⟩ is always short, so there is no need for a double ⟨n⟩.

ROR also discourages the extensive use of diacritics, unless they are well established in the reference languages mentioned above. This ties nicely with the principle of avoidance of double marking: stress is sometimes overmarked in writing by RMLs. One way of preventing this is to establish rules for a default interpretation of stress placement in an accent-less word; those words which defy the established rules will be marked by an accent (acute or grave).

ROR prescriptions

Here, the word “prescription” is used to define guidelines that are not considered general principles behind ROR, and which are more practical and specific in nature. These prescriptions are rooted in Latin etymology, but—as already stated—etymology should not be overly dominant in a Romance reintegrationist orthography. Remember that these prescriptions are suggestive in nature and allow for flexibility. I will use the symbol ‘$’ to indicate proposed spellings. These are merely examples used to illustrate ROR prescriptions and should not be interpreted as formal orthographic proposals.

Some of these ideas can contradict each other, so choosing one option over another is definitely acceptable. If a prescription is essential for creating a ROR orthography, it will be stated.

Palatal and palatalised sounds

Latin yod

“Yod” here refers to the phone [j].

The modern reflex of word-initial Latin yod at the beginning of a stressed syllable will determine the use of ⟨j⟩. For example, since Latin *iocat (“he/she plays”) evolved into Venetian [ˈzoga], that initial Venetian [z] can be written ⟨j⟩ (for example $joga).

The Italian route. The evolution of yod in this context usually coincides with the evolution of the prevocalic Latin sequences -gi- and -di-. Therefore, the resulting sound from this merger can be spelt the same as these sequences even in contexts where it comes from yod. This is the convention adopted in Italian.

hodiē “today” → It. oggi

iūnium “June” → It. giugno

medium “middle” → Venetian [ˈmɛzo] $megio, mejo, mexo, etc.

*iocat → Venetian [ˈzoga] $gioga, joga, xoga, etc.

The Catalan route. The general rule for this in Catalan is different. I will explain it to the best of my abilities, but note that the following is an oversimplification. Catalan treats the [(d)ʒ] sound that came from Latin yod the same as Latin g before e and i. For most cases, [(d)ʒ] is written ⟨g⟩ before ⟨e, i⟩ and ⟨j⟩ before ⟨a, o, u⟩. This means that, if Latin yod has the same reflex as g before e and i, then they can be treated the same as Catalan: said reflex—which is [(d)ʒ]—is written ⟨g⟩ before ⟨e, i⟩ and another choice has to be made before other vowel letters.

gentem “people” → Cat. [(d)ʒen(t)] gent

iūnium “June” → Cat. [(d)ʒuɲ] juny

medium “middle” → Ven. [ˈmɛzo] $megio, mejo

generum “son-in-law” → Ven. [ˈzɛnero] $gènero

Modern yod: the use of ⟨j⟩ vs ⟨y⟩ vs ⟨i⟩

Different Romance languages have different traditions for writing the sound [j].

Some do not distinguish it from [i] at all in writing, such as modern Italian, Catalan, Portuguese and Romanian; so that is certainly an option for a ROR orthography, if no or little ambiguity arises from such a choice. Other languages, however, do differentiate [j] in writing. Older Italian and many Romance minority languages in Italy traditionally use ⟨j⟩ for [j], while French and Spanish use ⟨y⟩. The preference for ⟨i⟩, ⟨j⟩ or ⟨y⟩ relies mainly on the interconnected strategies adopted for handling Latin yod (see §Latin yod) and the treatment of Latin ⟨c⟩ and ⟨g⟩ before front vowels (see §Latin c and g before front vowels).

Latin c and g before front vowels

Probably the most important choice to make when creating a reintegrationist orthography is how the letters ⟨c⟩ and ⟨g⟩ are handled. The following suggestions are especially recommended.

Before ⟨a, o, u⟩, the sound [k] should always be written with the letter ⟨c⟩, and the sound [g] should always be written with the letter ⟨g⟩. [k, g] should be written ⟨c, g⟩ also at the end of a word—unless some kind of liaison is present, in which case it is advisable to use a system inspired by French.

The use of the sequences ⟨ce, ci⟩ depends on the evolution of Latin c and g before e and i at the beginning of word-initial, stressed syllables. These will be referred to as palatal ⟨c⟩ and palatal ⟨g⟩, regardless of whether their realisations in modern languages have a palatal place of articulation. Take for example French cent and sent. Much like the Catalan examples cena and sena above, they are pronounced the same, but the initial [s] is represented according to its origin in Latin. A ROR orthography should reflect this: palatal ⟨c⟩ and palatal ⟨g⟩ before ⟨e, i⟩ cannot be represented with letters other than ⟨c⟩ and ⟨g⟩.

Many RMLs do not represent this faithfully, since their orthography is based on the way sounds are represented in the majority language of the state or nation where they are spoken. See the following examples.

Current RML Orthography	Proposed Spelling	Latin Word of Origin
Jèrriais Norman chent [ʃɑ̃] “hundred”	$cent	centum
Walloon djins [dʒɛ̃] “person”	$gens	gentēs
Genoese Ligurian Zêna [ˈzeːna] “Genoa”	$Gena	genu
Rumantsch tschintg [ˈtʃinc] “five”	$cintg	*cinque ← quinque

The ⟨s⟩ rule

It is recommended that a sound derived from Latin s be written with ⟨s⟩ or ⟨ss⟩, especially when the sound is [s] or [z]. However, this is a flexible guideline that may be disregarded in cases where following it would result in an orthographic form that makes it difficult for the reader to predict the pronunciation of a word.

The velar table

A way to write [k] and [g] before front vowels must be found. Strategies must also be developed to represent the sounds written with palatal ⟨c⟩ and ⟨g⟩ when they occur before back vowels.

I devised the following table which helps with creating said conventions, and which should be valid for all languages of reference and ROR orthographies. To illustrate how the table works, I will take Catalan as an example.

	Non-palatal ([k, g])		Palatal ([s, dʒ])
	before ⟨e, i⟩	before ⟨a, o, u⟩ and word finally	before ⟨e, i⟩	before ⟨a, o, u⟩ and word finally
Latin c	⟨qu⟩	⟨c⟩	⟨c⟩	⟨ç⟩/⟨ss⟩
Latin g	⟨gu⟩	⟨g⟩	⟨g⟩	⟨j⟩

There are two prominent strategies to write the sounds [k] and [g] before ⟨e, i⟩.

The first is the use of ⟨qu⟩ and ⟨gu⟩ (e.g. as in Catalan), while the second is the use of ⟨ch⟩ and ⟨gh⟩ (as in Italian, for example). This choice rests on the language’s phonotactics and sound inventory. If the language has the sequences [kw] and [gw], the orthographer might want to use ⟨qu⟩ and ⟨gu⟩ to represent them, reserving ⟨ch⟩ and ⟨gh⟩ for [k] and [g] before ⟨e, i⟩.

If the language does not display frequent use of sequences [kw, gw], the choice is a little freer. It must be noted that a diaeresis can be used to distinguish [kw, gw] from [k, g] if needed or desired (⟨qü, gü⟩), much like in Catalan; another option is the Spanish strategy of constrasting ⟨qu⟩ [k] vs ⟨cu⟩ [kw].

One consideration to make is whether the digraph ⟨ch⟩, which is quite common among the languages of reference, is useful to represent another phoneme or sound entirely; it is most common for some kind of palatalisation before ⟨a, o, u⟩—see French chat (← cattum) and Portuguese chorar (← plōrāre). Selecting ⟨ch⟩ to represent [k] rather than some kind of palatalised sound depends on the orthographer and, most importantly, to the language’s sound inventory: it is necessary to evaluate what graphemes the RML needs to represent all of its phonemes.

In order to write the sounds typically associated with palatalised ⟨c⟩ and ⟨g⟩ before ⟨a, o, u⟩, Catalan uses ⟨ç, ss, s⟩ for [s] and ⟨j⟩ for [(d)ʒ]. The choice between ⟨ç⟩ and ⟨ss⟩ is often etymological, while ⟨s⟩ is most commonly employed at word boundaries. As an example, let’s take the Catalan verb alçar “to lift” and the Spanish noun pez “fish”.

Cat. alçar [alˈsaɾ] (infinitive)

Cat. alces [ˈalses] “you lift”

Sp. pez [peθ]

Sp. peces [ˈpeθes] (plural)

In Catalan, ⟨ç⟩ and ⟨c⟩ are used to maintain the pronunciation [s] throughout the paradigm. The same goes for Spanish, where ⟨z, c⟩ are part of an alternating orthographic paradigm.

In this case, the orthographer has much liberty. The most common spellings for the sounds associated with palatal ⟨c⟩ and ⟨g⟩ in front of ⟨a, o, u⟩ are:

for ⟨c⟩: ci + vowel, z, ç/ss/s (also attested among RMLs: ch, tch, tg; other additional possibilities: sh, sch, x);
for ⟨g⟩: gi + vowel, j, ge + vowel (also attested among RMLs: z, sgi + vowel (for [ʒ]), sg (for [ʒ]); other additional possibilities: x, sj); see also §Latin yod.

Here is an example of a hypothetical velar table for Aostan Franco-Provençal.

	Non-palatal ([k, g])		Palatal ([ts, dz])
	before ⟨e, i⟩	before ⟨a, o, u⟩ and word finally	before ⟨e, i⟩	before ⟨a, o, u⟩ and word finally
Latin c	⟨ch⟩	⟨c⟩	⟨c⟩	⟨ch⟩
Latin g	⟨gu⟩	⟨g⟩	⟨g⟩	⟨j⟩

When planning an orthography, I strongly recommend filling in a velar table, in order to have a visual representation of a cohesive orthographic system.

[ɲ] and [ʎ]

To write the palatal nasal and the palatal lateral approximant, there are specific Romance traditions and it is advisable to stick to them. They are illustrated below.

For [ɲ]: gn, ñ, nh, ny.
For [ʎ]: gli + vowel, ll, lh, vowel + ill + vowel, vowel + il (also attested among RMLs: gl, ly).

⟨gn⟩ is recommended only if some instances of [ɲ] come from Latin gn. ⟨ñ⟩ is recommended only if some instances of [ɲ] come from Latin nn. ⟨nh, ny⟩ can be used in any orthography.

⟨gli⟩ + vowel or ⟨gl⟩ are recommended only if some instances of [ʎ] come from Latin gl or from the cl sequence often found in diminutives and some other words (see Romansh [eʎ] egl “eye” from oc(u)lum). ⟨ll⟩ is recommended only if some instances of [ʎ] come from Latin ll. ⟨lh, ly⟩ can be used in any orthography. ⟨ill, il⟩ are not recommended because they can cause confusion and be mistaken for [jl(l)].

The use of ⟨x⟩

It is possible to use ⟨x⟩ to represent a reflex of Latin cs (x) which is different from [s]. See the following examples for “thigh”, from Latin coxam. Other possibilities exist for this sound—namely sci + vowel, sch, sh, ch, ş.

Genoese Ligurian [ˈkøʃa] $cheuxa

Picard (Nort-Leulinghem) [ˈkɥiʃ] $cuix

Bolognese Emilian [ˈkɔːʃa] $còxa

⟨sci⟩ + vowel and ⟨sch⟩ are especially recommended for a sound (usually [ʃ]) which comes, in at least some instances, from Latin sc.

What to do with the remaining consonants

There is no other recommendation for the representation of the remaining consonants other than using the languages of reference as a guide.

For example, let’s take the affricates [ts, dz]. Assuming that these affricates are not the realisation of palatal ⟨c⟩ and ⟨g⟩ in front of ⟨e, i⟩ (as it happens in Franco-Provençal, for example), one could choose to adhere to the Italian convention and use ⟨z(z)⟩; or to the Catalan and Sardinian convention using ⟨tz⟩; or even to the Romanian convention using ⟨ţ⟩ (for [ts]); or to the medieval Iberian convention by using ⟨ç⟩ (for [ts]). The orthographer might also combine these strategies to differentiate between [ts] and [dz], if needed.

Vowels

Nasal vowels

There are two main strategies for marking nasal vowels in the languages of reference.

The Portuguese route. Portuguese marks its nasal vowels with a tilde on vowels: ⟨ã, õ⟩.

The French route. French uses mainly ⟨m, n⟩ to mark nasal vowels: ⟨am, an, aim, em, en, eim, ein, eun, im, in, um, un⟩ etc.

The Breton route. This strategy is not found among the Romance languages, but it can be useful if it is difficult to find another solution. Breton, a Celtic language, marks nasality writing a ⟨ñ⟩ after the vowel.

The choice rests on the shoulders of the orthographer. One must avoid the crowding of diacritics whilst also attempting to represent the phonemes of the language faithfully. Accessibility is also an issue: font support and keyboard layouts may influence this choice (see §Diacritics).

Front rounded vowels

It is preferable to not use diacritics to mark front rounded vowels; instead, the use of bare vowel letters or multigraphs is encouraged. Some options are as follows.

For [ø, œ]: eu, oeu/œu. (Also possible: oe/œ; other digraphs may be useful, such as oi, eo.)
For [y]: u. (Some useful digraphs, not found in Romance languages for [y]: ui, iu.)

If every other option fails to meet the goals of the orthographer, ROR suggests using the ‘double dot above’ diacritic on vowels, mainly in stressed syllables. This is only possible if the orthography does not use this diacritic as a diaeresis, i.e. to divide a diphthong or to highlight other features already discussed.

It might have been noticed that ROR proposes ⟨u⟩ for [y]. This is found in French, where, in contrast, [u] is ⟨ou⟩. Some RMLs have a distinction between [ɔ~o, u, y] and, again, this choice is left to the orthographer. Striking a balance between aesthetics and phonemic faithfulness is particularly challenging, especially when the language presents all of [ɔ~o, u, y], possibly in unstressed syllables too (see §Unstressed and reduced vowels and schwa ([ə])). In stressed syllables, it might be useful to utilise ⟨u/ú⟩, ⟨o/ó⟩, ⟨ò⟩ for [y, u, o] respectively. Alternatively, one might take the French route: ⟨u, ou, o⟩. Where the RML’s phonology distinguishes even more vowels, it may be reasonable to allow some degree of phonological ambiguity to minimise the use of diacritics.

Unstressed and reduced vowels and schwa ([ə])

As already stated, diacritics in unstressed syllables of languages with mobile stress are discouraged. This means that diacritics that are normally used for stress (namely the acute and grave accents) can be used in RMLs that have fixed stress placement with the purpose of changing vowel quality. This mainly happens in Belgium and northern France.

Another important thing to keep in mind is stress alternation and paradigmatic consistency. In Eastern Central Catalan, [o, ɔ] always reduce to [u] in unstressed syllables. This means that [u] is written ⟨o⟩ in those unstressed syllables where it would be [o, ɔ] somewhere else in the paradigm. See Eastern Central Catalan plorar [pluˈɾa] “to cry” and ploro [ˈplɔɾu] “I cry”.

Some RMLs make a distinction between all of [e, ɛ, ə]. These vowels are traditionally transcribed as ⟨e⟩ in the languages of reference; but, especially in unstressed syllables in languages with mobile stress, it is hard to manage both ROR guidelines and make a distinction for all of the three. As stated for [ɔ~o] vs [u] vs [y], it might be better to accept ambiguity rather than overcomplicate the orthography.

Other guidelines for vowels

It might be useful to resort to etymology to avoid the crowding of diacritics.

For example, in many varieties of Franco-Provençal, Latin i and [i] that was present at some point in the language evolved into [ø] in some instances. Since this is a pretty common sound, it may be best to spell it ⟨i⟩, while reserving some other solution, maybe a diacritic on ⟨i⟩, for [i]. For example, one might spell Aostan Franco-Provençal [ˈtsøvra] “goat” as $chivra; or [ˈføʎʎə] “daughter” as $filye (from Latin fīliam).

Stress is best indicated through acute and grave accents (for the difference between the two see §Vowel quality), but there can be exceptions: as already stated, diacritics in unstressed syllables of languages with mobile stress are discouraged, so they can be useful to indicate stress placement and vowel quality at the same time. See the following example.

Latin cinerem → Piedmontese $cënner [ˈsənnɛr]

In this case, I chose an umlaut on ⟨e⟩—already established in Piedmontese—to indicate that the pronunciation is not [e~ɛ], but [ə]. In this case, an umlaut is allowed, as it is on a stressed vowel. In practice, ⟨¨⟩ marks both stress and vowel quality.

Length

Vowel length distinctions, present historically in some of the reference languages, have been lost in most contemporary dialects. I will present two proposals for representing vowel length within a reintegrationist orthographic framework.

Circumflex accent. As per the French tradition, a circumflex may be used to mark vowel length. For example, one might choose to spell Zoagli Ligurian [fuˈgwaː] “hearth” as $fogoâ.

Doubling the following consonant. This strategy is attested in Emilian and can be utilised if a circumflex is not a viable choice. See the example of månnd in §Economy.

Vowel quality

The following are proposed strategies for representing vowel qualities. It is essential that the orthographer consider all the considerations previously outlined. These suggestions are intentionally broad and may not be suitable for every RML. Etymology may offer valuable guidance in the search for an appropriate orthographic solution.

Sound	Graphemes
[a]	⟨a⟩
[e]	⟨e, é⟩
[ɛ]	⟨e, è⟩
[i]	⟨i⟩
[o]	⟨o, ó⟩
[ɔ]	⟨o, ò⟩
[u]	⟨u, ou, o, ó⟩
[y]	⟨u, ui, iu⟩
[ø~œ]	⟨eu, oeu, œu, oe, œ, eo⟩
[ɨ]	⟨ï, i, ë⟩
[ə]	⟨e, ë⟩
[ʌ~ɤ]	⟨ŏ, ă⟩

✻ ✻ ✻

Further considerations

Sound vs phoneme

It might have been noticed that the words sound, phone and phoneme have not been properly differentiated in this proposal. This is because another choice that the orthographer has to make is whether to represent phonemes (i.e. distinguish in writing only the sounds that present minimal pairs in the language) or to also mark non-phonemic distinctions. ROR leaves the orthographer or language community free choice in this case.

Recycling traditional conventions

The last recommendation Romance Orthographic Reintegrationism makes is to reuse graphemes or conventions that have been traditionally used in the orthography, as long as they do not violate the other principles of ROR.

For example, it has been brought to my attention that Venetian ⟨j⟩ (i.e. following the so-called “Catalan route”—see §Latin yod) is appropriate according to the prescriptions described in §Latin yod and following paragraphs, but it would not be intuitive to native speakers. Instead, it has been suggested to adhere to the “Italian route” and mark both the reflex of Latin yod and intervocalic gi/di in the same way, using ⟨z⟩.

While adherence to reference language conventions strengthens the reintegrationist aim, practicality and intuitive readability for native users must not be disregarded. ROR recognises that orthographic acceptance depends not only on linguistic rigour but also on community resonance; therefore, it supports adapting historically justified conventions in ways that remain accessible and meaningful within each specific linguistic context.