MCP . user’s guide

Your bowser does not recognise Javascript: many features of this site will not work.

malay concordance project

home

about

papers

blogs

searching

texts

direct search

User’s guide

searching
words
phrases
morphology
chronology
rhymes

reports
references
annotations
punctuation
display
organisation

notes
jawi
transcription

caveats
integrity

Searching

Words

type of target	form of target	finds

plain	seri	seri, sri, berseri, serinya ...
strict	"seri"	seri, berseri ... (not sri, bersri ... )
root	/seri/	seri (not sri, berseri ...)
exact	\|sri\|	sri (not seri, bersri ...) searching raw text forms An exact search will produce unstable results if used to search more than one text.
jawi	s.r.y	seri, sari, serai ...
wildcard	s?r*	seri, sari, serigala, serta, serapat ...
regexp	/se[^r]i/	segi, semi, seni, sepi ...

Some texts contain special characters, like é, etc. However at present it only possible to search for the plain characters.

plain target

In a Simple search, a plain target looks for all the forms of the word, with and without affixes and reduplication.

Some extraneous forms may be found. For instance, a search for maka might pick up memakai (taking this as a me--i form). To avoid this, use a Morphology search to specify the possible affixes.

strict target

A strict target does not (generally) find variant forms of the target. For instance, seeking "syariah" would not find shariah or syariat.

root target

A root target does not find derived affixed forms. It is the equivalent of using a Morphology search, and specifying "− affixes"

exact target

An exact target finds ocurrences of this form in the text. It is only useful for texts that have non-standard spellings. For instance, |`adat| will find places where the word is spelt `adat, and not as adat. However most editions of texts used in the MCP have been largely normalised. Non-standard spellings may be most conveniently identified and accessed from the Formal word lists.

jawi target

Jawi spellings are conventionally transcribed in Roman letters by linking the letters with stops. A target in this form will look for words which could fit this Jawi spelling. Thus:

t.ng.g.l	tinggal, tanggal, tunggal
a.w.l.h	olah, oleh
s.i.l	sila, sial

Only very simple jawi targets are feasible.

Letters usually transcribed with a dicritic may be followed by a colon, thus:

z:h.r

zuhur, lohor

Note that a Jawi search has no access to the manuscript's original Jawi text. It is limited by the Roman transcription. Aspects of the Jawi spelling which are not well reflected in the Romanised form cannot be reliably retrieved. A Jawi-style search can therefore return only possibilities : the results may not reflect an original underlying Jawi spelling.

Jawi targets always define whole words. Angka dua cannot be used to represent reduplication.

wildcard target

Four wildcards are available:: ? matches any single letter; * or ... matches any number of letters (or none); C matches any consonant [with kh, sy, ng, ny taken as single consonants]; V matching any vowel or diphthong [with ai, au, oi as diphthongs]; (but note that the diphthong in pakai and the double vowel in punyai are both treated dipthongs)
Examples:: umma? ummah, ummat; kata* katanya, katakan, kataku, katalah ...; ??? apa, dia, sia, ... (i.e. all three-letter words); *-* ada-ada, banyak-banyak, ... berzaman-zamannya, , etc. (i.e. all reduplicated forms); VCV acu, ada, aku, akui ...

regexp target

For more complex patterns, many of the standard conventions of regular expressions can be used.
A regexp target must be enclosed in / / slashes.: . matches any single character; ? the preceding character or nothing; + matches any one or more of the preceding character; * matches any number of the preceding character, or none; [ ] any one of the characters enclosed in the brackets; [^ ] any character except those in the brackets; {2} two of the preceding character; {1,3} one to three of the preceding character; for convenience, the V and C wildcards may also be used, but cannot be grouped [ ] or negated [^ ].
Examples:

/ma.am/ makam, malam, masam, etc.

/.../ or /.{3}/ all three-letter words, apa, dia, etc.

/buah.*/ buah, buahnya, buahan, etc. but not buah-buah

/syariah?/ syaria, syariah

/s[hy]aria[ht]?/ sharia, shariah, shariat, syaria, syariah, syariat

/majal+ah/ majallah, majalah

/t[iu]nggal/ tinggal, tunggal

/t[^i]nggal/ tanggal, tunggal, etc. but not tinggal

/a[lr]+ah/ alah, Allah, arah

/[^m]?akan/ akan, rakan, etc. but not makan

/[bp]CV/ berbau, pergi, etc.

Phrases

Word-phrases that are sometimes single words, like hari bulan / haribulan, can be treated as single words.
Word couples like sini-sana or tempik-sorak are treated as separate targets.
Reduplicated forms like puji-pujian are always single targets that can be specified using the Morphology search.
Infixed forms like turun-temurun, gilang-gemilang, etc. can be found by searching for turun, gilang etc.

Simple searches

Use a Simple search for simple two-word phrases. To find Sultan Ahmad or (ber)jual-beli or gegak-gempita enter the components as the two targets.

Optioned searches

To find wider associations, use the Optioned search.

(also applies to Morphological and Chronological Searches.)

For instance, to find makan occuring near minum:

set the two targets as makan and minum;
select ‘with’
specify the range over which you want to look for an association (usually a range of 4 words gives worthwhile results);
optionally constrain the search look for associated words only on the right or the left of the first target.

absences

To find contexts in which makan occurs without minum, make the same Optioned search but click ‘excluding’ and all the remaining occurrences of makan will be reported.

Morphology

Use a Morphology search to find particular combinations of affixes

with a specified root
with any root

The menus

Use the menus to specify combinations of affixes.

In all menus:

− means no affix appears in this position.

~ means any affix in the menu may be present or absent.

a selected affix must be present.

alternative affixes can be selected from the menu by using "control click" (except for reduplications).

prefixes

The two menus available for prefixes will support searches for multiple prefixes: e.g. diper-, seber-, sese-, etc.

In the first menu, di/meN- will search for any of the affixes in this class ( di- ku- kau- me- meng- men- mem-) but will NOT capture the root form of the verb used in constructions like 'yang adinda hendaki'

reduplication

The menu:

− exludes reduplicated forms
~ any reduplicated or infixed forms: use other menus to specify the target further, e.g. ber- for reduplicated ber- forms.
R-R reduplications of the root: buah-buah, buah-buahan.: use other menus to specify the target further, e.g. ke- -an for kemalu-maluan etc.
S-S reduplications of the stem: buahan-buahan (but not buah-buahan): use other menus to specify the target further, e.g. ke- -an for kemaluan-kemaluan etc.
R-ber/meN R reciprocity: kata-berkata, tulis-menulis, etc.: use the first menu to specify the ber- or meN- variants.
ber R-R an mutuality: bermain-mainan, etc.: cannot be specified further.

Root undefined

In searches for morphological forms it is convenient to leave the root undefined: this is done by clicking the "root unspecifed" button.

In this case, a nominal root is posited by the search program. The nominal root comprises a minimun of 4 at least: CVC or CCV or ~V~V (where C is any consonant grapheme, and V any vowel). This should pick up all relevant forms, but it will also pull in some spurious ones. In particular the -i suffix cannot be conveniently retrieved in this way. Is sakai based on the root saka? There is still need for human oversight.

Syntax

Use the two targets to find syntactical relationships.

E.g. to find combinations of suruh with following me- or di- forms, for target 1 enter suruh ; then for target 2 select "root unspecified" and select di/meN from the first menu. Using the middle panel, select "present", "proximity", "1" to the right, and activate right associations (or deactivate left associations).

The variations are endless. You may enter "root unspecified" roots in both targets, for instance, to look for contexts in which me- and ber- forms occur side by side.

Chronology

The chronology of classical Malay texts is mostly far from clear, and is further muddied by the problems of variant versions, and the dating of manuscripts as opposed to the texts that the manuscripts contain. Notwithstanding these uncertainties, the following order has been adopted for the texts available in the MCP. The sequence is based on an estimate of the date at which a text was written or compiled, not the date of the manuscript from which the MCP text has been derived. For these estimates, see the chronological list of texts.

Rhymes

To find rhyme words, use the list of rhyme words appearing in each text. These lists are accessible through the list of texts.

Reports

Location references

Typically denote the text, and pages and lines, or stanzas and verses in the text. The system used for each text is explained in its Bibliography.

Tuah 402:33: By hovering the mouse over a reference, the full title and dates of text and manuscript will be revealed. Clicking on a reference will open a window giving bibliographical information.
Pwng p117: Similarly, but clicking on this form of link-reference will open a window showing a text extract, illustration, or the like.

Annotations

The aim is to keep the texts as uncluttered as possible.

* an asterisk will open a pop-up window providing a note on a disputed reading, difficulty, or explanation. These are only relevant for detailed consideration of the text.

(These symbols may appear grey *© or highlighted as hyperlinks *© depending on your browser.)

¶ § :: / | , ; : -- Elements in grey are punctuation added to the text to indicate verses, lists, paragraphs and the like. A paragraph ¶ or section § marker indicates the beginning or end of an independent section of a composite text: for instance, of a letter in the collection Warkah Warisan Melayu.

The verse markings are:

| beginning and end of stanzas;

; end of a couplet;

, end of a verse;

/ end of a rhythmic phrase;

: separating items in a list;

Also in grey are sections of the text not included in the indexes -- usually Arabic quotations.

Punctuation

The printed texts follow differing punctuation conventions. As far as possible, these are adjusted to a common pattern:

[ ] enclose material added to the text by the editor.

( ) enclose material in the text which the editor would omit.

In complex texts:

« » enclose material quoted from the base text which is being glossed.

[[ ]] enclose material added as marginal commentary to the base text.

Display

The best general purpose display is the Key Words In Context (KWIC) format. This is an economical and effective way of giving a quick overview of the word in its semantic contexts. With the contexts sorted alphabetically, the KWIC display also readily reveals the recurrence of conventional phrases and formulaic expressions.

If you wish to work with broader contexts than the KWIC format allows, switch to "fuller contexts".

Organisation

displaying phrases

Formulaic phrases will cluster better if the list of forms found is broken up into separate blocks according to the precise form of the word used in target 1 or in both targets 1 & 2.

When the association is not a set phrase but an association of ideas, it may show up better if the list is organised by "text".

displaying distributions

The changing emphases within a text may be revealed by how a key word or phase is distributed within it. To get this perspective, organise the list by "text".

For shifting meanings of a word over several texts and across time, an alternative is to organise the list by "dates". Be aware, however, that the dates of earlier texts are often indefinite and hypothetical. See the note on this above.

displaying patterns

With a morphological query, if the target is a syntactic form, it may be best displayed without dividing the listing, selecting "0".

Notes

Text notes are generally limited to editorial matters: other possible readings, likely scribal errors, and so on. When possible, the jawi forms of proper names or difficult words may be given.

Jawi transcription

transcription table

a b t th j c h kh d dh r z s sy s d t z ` gh ng f p q k g l m n w h ´ y ny

	alif maksurah	.a
	ta marbutah	.t" / .h"
	angka dua	.2
	baris di atas / fathah	.ba .ta .tha .ja .ha etc.
	baris di bawah / kasrah	.bi
	baris di hadapan / dammah	.bu
	tanda mati / jazm	.b°
	tasydid	.bb

examples

m.r.y.k.´.y.t	mereka itu
m.l.k.w.2.k.n	melaku-lakukan
fi.ra.a.sa.t"°	firasat

Caveats

Integrity

The integrity of the search results depends on several factors:

the editorial decisions of the scholars responsible for editing and publishing the texts
the inputting of the texts, through manual transcriptions or optical character recognition
the indexing of the texts, which involves particularly identifying non-standard spellings and managing word divisions
the algorithms used by the search program.

There is room for error at every step along the way. Every care has been taken to develop search algorithms that will produce reliable results -- neither too likely to include unwanted forms, nor too likely to pass over desired forms. There is no guarantee that this will be achieved in all cases. The more undefined elements there are in the search, the more room there is for slippage. Moreover the success of the algorithm depends upon the quality of the data.

	/ma.am/	makam, malam, masam, etc.
	/.../ or /.{3}/	all three-letter words, apa, dia, etc.
	*/buah./**	buah, buahnya, buahan, etc. but not buah-buah
	/syariah?/	syaria, syariah
	/s[hy]aria[ht]?/	sharia, shariah, shariat, syaria, syariah, syariat
	/majal+ah/	majallah, majalah
	/t[iu]nggal/	tinggal, tunggal
	/t[^i]nggal/	tanggal, tunggal, etc. but *not* tinggal
	/a[lr]+ah/	alah, Allah, arah
	/[^m]?akan/	akan, rakan, etc. but not makan
	/[bp]CV/	berbau, pergi, etc.