User’s guide



of target

of target


    plain  seri  seri, sri, berseri, serinya ...
    strict  "seri"  seri, berseri ...   (not sri, bersri ... )
    root  /seri/  seri   (not sri, berseri ...)
    exact  |sri|  sri   (not seri, bersri ...)   searching raw text forms

An exact search will produce unstable results if used to search more than one text.

    jawi  s.r.y  seri, sari, serai ...
    wildcard  s?r*  seri, sari, serigala, serta, serapat ...
    regexp  /se[^r]i/  segi, semi, seni, sepi ...
Some texts contain special characters, like é, etc.  However at present it only possible to search for the plain characters.

  plain target

In a Simple search, a plain target looks for all the forms of the word, with and without affixes and reduplication. 
Some extraneous forms may be found.  For instance, a search for  maka  might pick up memakai (taking this as a me--i form).  To avoid this, use a Morphology search to specify the possible affixes.

  strict target

A strict target does not (generally) find variant forms of the target.  For instance, seeking "syariah" would not find shariah or syariat.

  root target

A root target does not find derived affixed forms.  It is the equivalent of using a Morphology search, and specifying  "− affixes"

  exact target

An exact target finds ocurrences of this form in the text.  It is only useful for texts that have non-standard spellings.  For instance, |`adat| will find places where the word is spelt `adat, and not as adat.  However most editions of texts used in the MCP have been largely normalised.   Non-standard spellings may be most conveniently identified and accessed from the Formal word lists.

  jawi target

Jawi spellings are conventionally transcribed in Roman letters by linking the letters with stops.  A target in this form will look for words which could fit this Jawi spelling.  Thus:, tanggal, tunggal
a.w.l.holah, oleh
s.i.lsila, sial
Only very simple jawi targets are feasible.
Letters usually transcribed with a dicritic may be followed by a colon, thus:
z:h.rzuhur, lohor
Note that a Jawi search has no access to the manuscript's original Jawi text.  It is limited by the Roman transcription.  Aspects of the Jawi spelling which are not well reflected in the Romanised form cannot be reliably retrieved.  A Jawi-style search can therefore return only possibilities : the results may not reflect an original underlying Jawi spelling.
Jawi targets always define whole words.  
Angka dua cannot be used to represent reduplication.

  wildcard target

Four wildcards are available:
?  matches any single letter
* or ...  matches any number of letters (or none)
C  matches any consonant [with kh, sy, ng, ny taken as single consonants]
V  matching any vowel or diphthong [with ai, au, oi as diphthongs]
(but note that the diphthong in pakai and the double vowel in punyai are both treated dipthongs)
umma?    ummah, ummat
kata*    katanya, katakan, kataku, katalah ...
???    apa, dia, sia, ... (i.e. all three-letter words)
*-*    ada-ada, banyak-banyak, ... berzaman-zamannya, , etc. (i.e. all reduplicated forms)
VCV    acu, ada, aku, akui ...

  regexp target

For more complex patterns, many of the standard conventions of regular expressions can be used.
A regexp target must be enclosed in / / slashes.
.  matches any single character
?  the preceding character or nothing
+  matches any one or more of the preceding character
*  matches any number of the preceding character, or none
[ ]  any one of the characters enclosed in the brackets
[^ ]  any character except those in the brackets
{2}  two of the preceding character
{1,3}  one to three of the preceding character
for convenience, the V and C wildcards may also be used, but cannot be grouped [ ] or negated [^ ].
/, malam, masam, etc.
/.../  or  /.{3}/all three-letter words, apa, dia, etc.
/buah.*/buah, buahnya, buahan, etc. but not  buah-buah
/syariah?/syaria, syariah
/s[hy]aria[ht]?/sharia, shariah, shariat, syaria, syariah, syariat
/majal+ah/majallah, majalah
/t[iu]nggal/tinggal, tunggal
/t[^i]nggal/tanggal, tunggal, etc. but not   tinggal
/a[lr]+ah/alah, Allah, arah
/[^m]?akan/akan, rakan, etc. but not  makan
/[bp]CV/berbau, pergi, etc.


  • Word-phrases that are sometimes single words, like hari bulan / haribulan, can be treated as single words.
  • Word couples like sini-sana or tempik-sorak are treated as separate targets.
  • Reduplicated forms like puji-pujian are always single targets that can be specified using the Morphology search.
  • Infixed forms like turun-temurun, gilang-gemilang, etc. can be found by searching for turun, gilang etc.

  Simple searches

Use a Simple search for simple two-word phrases. To find Sultan Ahmad or (ber)jual-beli or gegak-gempita enter the components as the two targets.

  Optioned searches

To find wider associations, use the Optioned search.
                (also applies to Morphological and Chronological Searches.)
For instance, to find makan occuring near minum:
  1. set the two targets as makan and minum;
  2. select ‘with’
  3. specify the range over which you want to look for an association (usually a range of 4 words gives worthwhile results);
  4. optionally constrain the search look for associated words only on the right or the left of the first target.


To find contexts in which makan occurs without minum,  make the same Optioned search but click ‘excluding’ and all the remaining occurrences of makan will be reported.


Use a Morphology search to find particular combinations of affixes
  1. with a specified root
  2. with any root

  The menus

Use the menus to specify combinations of affixes.
In all menus:
  means no affix appears in this position.
~  means any affix in the menu may be present or absent.
a selected affix must be present.
alternative affixes can be selected from the menu by using "control click" (except for reduplications).
The two menus available for prefixes will support searches for multiple prefixes:   e.g. diper-,  seber-,  sese-, etc.
In the first menu, di/meN- will search for any of the affixes in this class ( di- ku- kau- me- meng- men- mem-) but will NOT capture the root form of the verb used in constructions like 'yang adinda hendaki'
The menu:
   exludes reduplicated forms
~   any reduplicated or infixed forms
use other menus to specify the target further, e.g. ber- for reduplicated ber- forms.
R-R   reduplications of the root:  buah-buah, buah-buahan.
use other menus to specify the target further, e.g. ke- -an for kemalu-maluan etc.
S-S   reduplications of the stem:  buahan-buahan (but not buah-buahan)
use other menus to specify the target further, e.g. ke- -an for kemaluan-kemaluan etc.
R-ber/meN R   reciprocity:  kata-berkata, tulis-menulis, etc.
use the first menu to specify the ber- or meN- variants.
ber R-R an   mutuality:  bermain-mainan, etc.
cannot be specified further.
  Root undefined
In searches for morphological forms it is convenient to leave the root undefined:  this is done by clicking the "root unspecifed" button. 
In this case, a nominal root is posited by the search program.  The nominal root comprises a minimun of 4 at least:  CVC  or  CCV or  ~V~V  (where C is any consonant grapheme, and V any vowel).  This should pick up all relevant forms, but it will also pull in some spurious ones.  In particular the -i suffix cannot be conveniently retrieved in this way.  Is sakai based on the root saka?  There is still need for human oversight.
Use the two targets to find syntactical relationships.
E.g. to find combinations of suruh with following me- or di- forms, for target 1 enter suruh ;  then for target 2 select "root unspecified" and select di/meN from the first menu.  Using the middle panel, select "present", "proximity", "1" to the right, and activate right associations (or deactivate left associations).
The variations are endless.  You may enter "root unspecified" roots in both targets, for instance, to look for contexts in which me- and ber- forms occur side by side.


The chronology of classical Malay texts is mostly far from clear, and is further muddied by the problems of variant versions, and the dating of manuscripts as opposed to the texts that the manuscripts contain.  Notwithstanding these uncertainties, the following order has been adopted for the texts available in the MCP.   The sequence is based on an estimate of the date at which a text was written or compiled, not the date of the manuscript from which the MCP text has been derived.   For these estimates, see the chronological list of texts.


To find rhyme words, use the list of rhyme words appearing in each text.  These lists are accessible through the list of texts.


Location references

Typically denote the text, and pages and lines, or stanzas and verses in the text.  The system used for each text is explained in its Bibliography.
Tuah 402:33
By hovering the mouse over a reference, the full title and dates of text and manuscript will be revealed.  Clicking on a reference will open a window giving bibliographical information.
Pwng p117
Similarly, but clicking on this form of link-reference will open a window showing a text extract, illustration, or the like.


The aim is to keep the texts as uncluttered as possible. 
* an asterisk will open a pop-up window providing a note on a disputed reading, difficulty, or explanation.   These are only relevant for detailed consideration of the text.
© this sign will open a pop-up window providing an illustration attached to the text: a diagram, or non-Malay text.
(These symbols may appear grey or highlighted as hyperlinks depending on your browser.)
¶ § :: / | , ; : -- Elements in grey are punctuation added to the text to indicate verses, lists, paragraphs and the like.   A paragraph or section § marker indicates the beginning or end of an independent section of a composite text: for instance, of a letter in the collection Warkah Warisan Melayu.
The verse markings are:
|    beginning and end of stanzas;
;    end of a couplet;
,    end of a verse;
/    end of a rhythmic phrase;
:    separating items in a list;
Also in grey are sections of the text not included in the indexes -- usually Arabic quotations.


The printed texts follow differing punctuation conventions.   As far as possible, these are adjusted to a common pattern:
[ ] enclose material added to the text by the editor.
( ) enclose material in the text which the editor would omit.
In complex texts:
« » enclose material quoted from the base text which is being glossed.
[[ ]] enclose material added as marginal commentary to the base text.


The best general purpose display is the Key Words In Context (KWIC) format.  This is an economical and effective way of giving a quick overview of the word in its semantic contexts.  With the contexts sorted alphabetically, the KWIC display also readily reveals the recurrence of conventional phrases and formulaic expressions.
If you wish to work with broader contexts than the KWIC format allows, switch to "fuller contexts".


  displaying phrases
Formulaic phrases will cluster better if the list of forms found is broken up into separate blocks according to the precise form of the word used in target 1 or in both targets 1 & 2
When the association is not a set phrase but an association of ideas, it may show up better if the list is organised by "text".
  displaying distributions
The changing emphases within a text may be revealed by how a key word or phase is distributed within it.  To get this perspective, organise the list by "text".
For shifting meanings of a word over several texts and across time, an alternative is to organise the list by "dates".  Be aware, however, that the dates of earlier texts are often indefinite and hypothetical.  See the note on this above.
  displaying patterns
With a morphological query, if the target is a syntactic form, it may be best displayed without dividing the listing, selecting "0".


Text notes are generally limited to editorial matters:  other possible readings, likely scribal errors, and so on.  When possible, the jawi forms of proper names or difficult words may be given.

Jawi transcription

  transcription table
a  b  t  th  j  c  h  kh  d  dh  r  z  s  sy  s  d  t  z  `  gh  ng  f  p  q  k  g  l  m  n  w  h  ´  y  ny
alif maksurah.a
ta marbutah.t" / .h"
angka dua.2
baris di atas /   .ta   .tha   .ja   .ha   etc.
baris di bawah /
baris di hadapan / dammah.bu
tanda mati / jazm.b°
m.r.y.k.´.y.tmereka itu



The integrity of the search results depends on several factors:
  • the editorial decisions of the scholars responsible for editing and publishing the texts
  • the inputting of the texts, through manual transcriptions or optical character recognition
  • the indexing of the texts, which involves particularly identifying non-standard spellings and managing word divisions
  • the algorithms used by the search program.
There is room for error at every step along the way.  Every care has been taken to develop search algorithms that will produce reliable results -- neither too likely to include unwanted forms, nor too likely to pass over desired forms.  There is no guarantee that this will be achieved in all cases.  The more undefined elements there are in the search, the more room there is for slippage.  Moreover the success of the algorithm depends upon the quality of the data.