Table Of ContentDiscovering Global Patterns in Linguistic Networks through
Spectral Analysis: A Case Study of the Consonant Inventories
∗
Animesh Mukherjee
Indian InstituteofTechnology,Kharagpur
animeshm@cse.iitkgp.ernet.in
MonojitChoudhury and RaviKannan
MicrosoftResearch India
monojitc,kannan @microsoft.com
{ }
9
Abstract Most of the existing studies on linguistic
0
0 networks, however, focus only on the local
Recent research has shown that language
2
structural properties such as the degree and
and the socio-cognitive phenomena asso-
n clustering coefficient of the nodes, and shortest
a ciated with it can be aptly modeled and
paths between pairs of nodes. On the other hand,
J visualized through networks of linguistic
5 entities. However, most of the existing although it is a well known fact that the spectrum
1 of a network can provide important information
works on linguistic networks focus only
about its global structure, theuse of this powerful
] on the local properties of the networks.
L mathematical machinery to infer global patterns
This study is an attempt to analyze the
C in linguistic networks is rarely found in the liter-
structure of languages via a purely struc-
.
s ature. Note that spectral analysis, however, has
tural technique, namely spectral analysis,
c
been successfully employed in the domains of bi-
[ which isideally suited for discovering the
ological and social networks (Farkasetal.,2001;
1 global correlations in a network. Appli-
Gkantsidisetal.,2003; BanerjeeandJost,2007).
v cation of this technique to PhoNet, the
6 In the context of linguistic networks,
co-occurrence network of consonants, not
1 (BelkinandGoldsmith,2002) is the only work
2 onlyrevealsseveralnaturallinguisticprin-
we are aware of that analyzes the eigenvectors
2 ciples governing the structure of the con-
. to obtain a two dimensional visualize of the
1 sonantinventories,butisalsoabletoquan-
0 network. Nevertheless, the work does not study
tify their relative importance. We believe
9 thespectrum ofthegraph.
0 that this powerful technique can be suc-
: cessfully applied, in general, to study the The aim of the present work is to demon-
v
i structure ofnaturallanguages. strate the use of spectral analysis for discover-
X
ing the global patterns in linguistic networks.
r 1 Introduction
a These patterns, in turn, are then interpreted in
Language and the associated socio-cognitive the light of existing linguistic theories to gather
phenomena can be modeled as networks, where deeper insights into the nature of the under-
the nodes correspond to linguistic entities and the lying linguistic phenomena. We apply this
edges denote the pairwise interaction or relation- rather generic technique to find the principles
shipbetweentheseentities. Thestudyoflinguistic that are responsible for shaping the consonant
networks has been quite popular in the recent inventories, which is a well researched prob-
timesandhasprovided uswithseveral interesting lem in phonology since 1931 (Trubetzkoy, 1931;
insights into the nature of language (see Choud- LindblomandMaddieson, 1988; Boersma,1998;
hury and Mukherjee (toappear) for an extensive Clements,2008). The analysis is carried out
survey). Examples include study of the Word- on a network defined in (Mukherjee etal.,2007),
Net (SigmanandCecchi,2002), syntactic depen- where the consonants are the nodes and there is
dency network of words (Ferrer-i-Cancho, 2005) an edge between two nodes u and v if the con-
and network of co-occurrence of consonants sonants corresponding to them co-occur in a lan-
in sound inventories (Mukherjee etal.,2008; guage. The number of times they co-occur across
Mukherjee etal.,2007). languages define the weight of the edge. We ex-
∗Thisresearchhasbeenconductedduringtheauthor’sin- plain the results obtained from the spectral analy-
ternshipatMicrosoftResearchIndia. sisofthenetworkpost-facto usingthreelinguistic
principles. Themethod alsoautomatically reveals isknownastheadjacencymatrix,issymmetricfor
the quantitative importance of each of these prin- anundirected graphandhavebinaryentriesforan
ciples. unweightedgraph. λisaneigenvalue ofAifthere
It is worth mentioning here that earlier re- isann-dimensional vectorxsuchthat
searchers have also noted the importance of the
Ax = λx
aforementioned principles. However, what was
not known was how much importance one should AnyrealsymmetricmatrixAhasn(possiblynon-
associate with each of these principles. We also distinct) eigenvalues λ0 λ1 ... λn−1, and
≤ ≤ ≤
notethatthetechnique ofspectralanalysisneither correspondingneigenvectorsthataremutuallyor-
explicitlynorimplicitlyassumesthattheseprinci- thogonal. Thespectrumofagraphisthesetofthe
plesexistorareimportant,butdeducesthemauto- distinct eigenvalues of the graph and their corre-
matically. Thus, we believe that spectral analysis sponding multiplicities. It is usually represented
is a promising approach that is well suited to the as a plot with the eigenvalues in x-axis and their
discovery of linguistic principles underlying a set multiplicities plotted inthey-axis.
of observations represented as a network of enti- The spectrum of real and random graphs dis-
ties. The fact that the principles “discovered” in play several interesting properties. Banerjee and
thisstudyarealreadywellestablishedresultsadds Jost(2007)reportthespectrumofseveralbiologi-
to the credibility of the method. Spectral analysis cal networks that are significantly different from
oflargelinguisticnetworksinthefuturecanpossi- the spectrum of artificially generated graphs2.
blyrevealhitherto unknownuniversal principles. Spectral analysis is also closely related to Prin-
The rest of the paper is organized as follows. cipal Component Analysis and Multidimensional
Sec. 2 introduces the technique of spectral anal- Scaling. If the first few (say d) eigenvalues of a
ysis of networks and illustrates some of its ap- matrix are muchhigher than the rest ofthe eigen-
plications. The problem of consonant inventories values, then it can be concluded that the rows of
and how itcanbe modeled and studied within the the matrix can be approximately represented as
framework oflinguistic networks aredescribed in linear combinations of dorthogonal vectors. This
Sec. 3. Sec. 4 presents the spectral analysis of further implies that the corresponding graph has
the consonant co-occurrence network, the obser- a few motifs (subgraphs) that are repeated a large
vations and interpretations. Sec. 5 concludes by number of time to obtain the global structure of
summarizing the work and the contributions and thegraph(BanerjeeandJost,toappear).
listingoutfutureresearchdirections. Spectral properties are representative of an n-
dimensional average behavior of the underlying
2 A Primerto Spectral Analysis system, thereby providing considerable insight
intoitsglobalorganization. Forexample,theprin-
Spectral analysis1 is a powerful tool capable of
cipaleigenvector(i.e.,theeigenvectorcorrespond-
revealing the global structural patterns underly-
ing to the largest eigenvalue) is the direction in
ing an enormous and complicated environment
which the sum of the square of the projections
of interacting entities. Essentially, it refers to
of the row vectors of the matrix is maximum. In
the systematic study of the eigenvalues and the
fact,theprincipaleigenvectorofagraphisusedto
eigenvectors of the adjacency matrix of the net-
compute the centrality of the nodes, which is also
work of these interacting entities. Here we shall
knownasPageRankinthecontextofWWW.Sim-
brieflyreviewthebasicconceptsinvolvedinspec-
ilarly, the second eigen vector component is used
tral analysis and describe some of its applications
forgraphclustering.
(see (Chung,1994; KannanandVempala,2008)
In the next twosections wedescribe how spec-
fordetails).
tral analysis can be applied to discover the orga-
Anetworkoragraphconsisting ofnnodes(la-
nizing principles underneath the structure of con-
beledas1throughn)canberepresentedbyan n
sonantinventories.
×
squarematrixA,wheretheentrya representsthe
ij 2Banerjee and Jost (2007) report the spectrum of the
weightoftheedgefromnodeitonodej. A,which
graph’s Laplacian matrix rather than the adjacency matrix.
It isincreasingly popular these days to analyze the spectral
1Thetermspectralanalysisisalsousedinthecontextof propertiesofthegraph’sLaplacianmatrix.However,forrea-
signalprocessing,whereitreferstothestudyofthefrequency sonsexplainedlater,herewewillbeconductspectralanalysis
spectrumofasignal. oftheadjacencymatrixratherthanitsLaplacian.
Figure1: Illustration ofthenodesandedgesofPlaNetandPhoNetalongwiththeirrespectiveadjacency
matrixrepresentations.
3 Consonant Co-occurrence Network languagenodev V (representing thelanguage
l L
∈
l)andaconsonantnodev V (representing the
c C
The most basic unit of human languages are the ∈
consonant c) iff the consonant c is present in the
speech sounds. The repertoire of sounds that
inventoryofthelanguagel. Thisnetworkiscalled
make up the sound inventory of a language are
the Phoneme-Language Network or PlaNet and
not chosen arbitrarily even though the speak-
represent the connections between the language
ers are capable of producing and perceiving a and the consonant nodes through a 0-1 matrix A
plethora of them. In contrast, these invento-
asshownbyahypotheticalexampleinFig.1. Fur-
ries show exceptionally regular patterns across
ther, in (Mukherjee etal.,2007), the authors de-
the languages of the world, which is in fact,
fine the Phoneme-Phoneme Network or PhoNet
a common point of consensus in phonology.
astheone-modeprojectionofPlaNetontothecon-
th
Right from the beginning of the 20 century, sonant nodes, i.e., a network G = hVC,Ecc′i,
there have been a large number of linguisti- wherethenodesaretheconsonants andtwonodes
cally motivated attempts (Trubetzkoy, 1969; vc andvc′ arelinkedbyanedgewithweightequal
LindblomandMaddieson, 1988; Boersma,1998; to the number of languages in which both c and
Clements,2008) to explain the formation c′ occur together. In other words, PhoNet can be
of these patterns across the consonant in- expressed as a matrix B (see Fig. 1) such that
ventories. More recently, Mukherjee and B = AAT D where D is a diagonal matrix
his colleagues (Choudhury etal.,2006; −
with its entries corresponding to the frequency of
Mukherjee etal.,2007; Mukherjee etal.,2008)
occurrence of the consonants. Similarly, we can
studiedthisproblemintheframeworkofcomplex
also construct the one-mode projection of PlaNet
networks. Since here we shall conduct a spectral
onto the language nodes (which we shall refer to
analysis of the network defined in Mukherjee et
as the Language-Language Graph or LangGraph)
al. (2007), we briefly survey the models and the can be expressed as B′ = ATA D′, where D′
importantresultsoftheirwork. −
isadiagonalmatrixwithitsentriescorresponding
Choudhury et al. (2006) introduced a bipartite
to the size of the consonant inventories for each
networkmodelfortheconsonantinventories. For-
language.
mally,asetofconsonantinventoriesisrepresented
asagraph G = V ,V ,E ,wherethenodes in The matrix A and hence, B and B′ have
L C lc
h i
onepartitioncorrespondtothelanguages(V )and been constructed from the UCLA Phono-
L
thatintheotherpartitioncorrespond totheconso- logical Segment Inventory Database (UP-
nants (V ). There is an edge (v , v ) between a SID) (Maddieson, 1984) that hosts the consonant
C l c
inventories of 317 languages with a total of scale. Someoftheimportantobservationsthatone
541 consonants found across them. Note that, canmakefromtheseresultsareasfollows.
UPSID uses articulatory features to describe First,themajorbulkoftheeigenvaluesarecon-
the consonants and assumes these features to be centrated at around 0. This indicates that though
binary-valued, which in turn implies that every the order of B is 541 541, its numerical rank is
×
consonant can be represented by a binary vector. quite low. Second, there are at least a few very
Later on, we shall use this representation for our large eigenvalues that dominate the entire spec-
experiments. trum. In fact, 89% of the spectrum, or the square
By construction, we have VL = 317, VC = of the Frobenius norm, is occupied by the princi-
| | | |
541, Elc = 7022, and Ecc′ = 30412. Con- pal(i.e.,thetopmost)eigenvalue,92%isoccupied
| | | |
sequently, the order of the matrix A is 541 by the first and the second eigenvalues taken to-
317 and that of the matrix B′ is 541 gether, while 93% is occupied by the first three
× ×
541. It has been found that the degree distri- taken together. The individual contribution of the
bution of both PlaNet and PhoNet roughly in- other eigenvalues to the spectrum is significantly
dicate a power-law behavior with exponential lower than that of the top three. Third, the eigen-
cut-offs towards the tail (Choudhury etal.,2006; valuesoneitherendsofthespectrumtendtodecay
Mukherjee etal.,2007). Furthermore, PhoNet is gradually,mostlyindicatingapower-lawbehavior.
also characterized by a very high clustering co- The power-law exponents at the positive and the
efficient. The topological properties of the two negative ends are -1.33 (the R2 value of the fit is
networks and the generative model explaining 0.98)and-0.88(R2 0.92)respectively.
∼
the emergence of these properties are summa-
The numerically low rank of PhoNet suggests
rizedin(Mukherjeeetal.,2008). However,allthe
that there are certain prototypical structures that
above properties are useful in characterizing the frequently repeatthemselvesacrosstheconsonant
local patterns ofthe network and provide very lit- inventories, thereby, increasing the number of 0
tleinsightaboutitsglobalstructure. eigenvalues to a large extent. In other words, all
therowsofthematrixB(i.e.,theinventories) can
4 Spectral AnalysisofPhoNet
be expressed as the linear combination of a few
In this section we describe the procedure and re- independent rowvectors,alsoknownasfactors.
sults ofthespectral analysis ofPhoNet. Webegin Furthermore, the fact that the principal eigen-
with computation of the spectrum of PhoNet. Af- valueconstitutes89%oftheFrobeniusnormofthe
tertheanalysisofthespectrum,wesystematically spectrum implies that there exist one very strong
investigate the top few eigenvectors of PhoNet organizing principle which should be able to ex-
and attempt to characterize their linguistic signif- plainthebasicstructureoftheinventoriestoavery
icance. In the process, we also analyze the corre- good extent. Since the second and third eigen-
sponding eigenvectors of LanGraph that helps us values are also significantly larger than the rest
incharacterizing theproperties oflanguages. of the eigenvalues, one should expect two other
organizing principles, which along with the basic
4.1 SpectrumofPhoNet
principle,shouldbeabletoexplain,(almost)com-
Using a simple Matlab script we compute the pletely, the structure of the inventories. In order
spectrum (i.e., the list of eignevalues along with to “discover” these principles, we now focus our
their multiplicities) of the matrix B correspond- attention tothefirstthreeeigenvectors ofPhoNet.
ing to PhoNet. Fig. 2(a) shows the spectral plot,
which has been obtained through binning3 with a 4.2 TheFirstEigenvector ofPhoNet
fixedbinsizeof20. Inordertohaveabettervisu-
Fig. 2(d) shows the first eigenvector component
alization ofthe spectrum, in Figs. 2(b) and (c) we
for each consonant node versus its frequency of
furtherplotthetop50(absolute)eigenvaluesfrom
occurrenceacrossthelanguageinventories(i.e.,its
thetwoendsofthespectrumversustheindexrep-
degreeinPlaNet). Thefigureclearlyindicatesthat
resenting their sorted order in doubly-logarithmic
the two are highly correlated (r = 0.99), which in
3Binning istheprocess of dividing theentirerange of a turn means that 89% of the spectrum and hence,
variable into smaller intervals and counting the number of
the organization of the consonant inventories, can
observationswithineachbinorinterval. Infixedbinning,all
theintervalsareofthesamesize. be explained to a large extent by the occurrence
Figure 2: Eigenvalues and eigenvectors of B. (a) Binned distribution of the eigenvalues (bin size =20)
versustheirmultiplicities. (b)thetop50(absolute)eigenvaluesfromthepositiveendofthespectrumand
their ranks. (c) Sameas(b) forthe negative end ofthespectrum. (d), (e) and (f)respectively represents
thefirst,secondandthethirdeigenvectorcomponentsversustheoccurrencefrequencyoftheconsonants.
frequency of the consonants. Thequestion arises: trix X with X = Cf f for some vector f =
i,j i j
⊤
Doesthistellussomethingspecialaboutthestruc- (f ,f ,...f ) that represents the frequency of
1 2 n
tureofPhoNetorisitalwaysthecaseforanysym- thenodes andanormalization constant C. Thisis
metric matrix that the principal eigenvector will whatwerefertoas”proportionate co-occurrence”
be highly correlated with the frequency? We as- because the extent of co-occurrence between the
sert that the former is true, and indeed, the high nodes i and j (which is X or the weight of the
i,j
correlation between the principal eigenvector and edge between i and j) is exactly proportionate to
the frequency indicates high “proportionate co- the frequencies of the two nodes. The principal
occurrence” -atermwhichwewillexplain. eigenvector inthiscaseisf itself, andthus, corre-
Toseethis,considerthefollowing2n 2nma- lates perfectly with the frequencies. Unlike this
×
trixX hypothetical matrix X, PhoNet has all 0 entries
in the diagonal. Nevertheless, this perturbation,
0 M 0 0 0 ...
1 which is equivalent to subtracting f2 from the ith
M 0 0 0 0 ... i
1 diagonal,seemstobesufficientlysmalltopreserve
X = 0 0 0 M2 0 ... the“proportionate co-occurrence” behavior ofthe
0 0 M2 0 0 ... adjacencymatrixtherebyresultingintoahighcor-
.. .. .. .. .. ..
. . . . . . relationbetweentheprincipaleigenvector compo-
nentandthefrequencies.
where X = X = M for all odd
i,i+1 i+1,i (i+1)/2 On the other hand, to construct the Lapla-
i and 0 elsewhere. Also, M1 > M2 > ... > cianmatrix,wewouldhavesubtracted f n f
i j=1 j
M 1. Essentially, this matrix represents a P
n ≥ from the ith diagonal entry, which is a much
graph which is a collection of n disconnected
larger quantity than f2. In fact, this operation
edges, each having weights M , M , and so on. i
1 2 would have completely destroyed the correlation
It is easy to see that the principal eigenvector of
between the frequency and the principal eigen-
this matrix is (1/√2,1/√2,0,0,...,0)⊤, which
vector component because the eigenvector corre-
ofcourseisverydifferentfromthefrequencyvec-
spondingtothesmallest4 eigenvalue oftheLapla-
⊤
tor: (M ,M ,M ,M ,...,M ,M ) .
1 1 2 2 n n
At the other extreme, consider an n n ma- 4Theroleplayedbythetopeigenvaluesandeigenvectors
×
⊤
cianmatrixis[1,1,...,1] . positive component of the second eigenvector as
Sincethefirsteigenvector ofBisperfectly cor- MAX and the absolute maximum value of the
+
related with the frequency of occurrence of the negative component as MAX−. If the absolute
consonants across languages it is reasonable to value of apositive component is less than 15% of
argue that there is a universally observed innate MAX then assign a neutral class to the corre-
+
preference towards certain consonants. This pref- spondingconsonant; elseassignitapositiveclass.
erence is often described through the linguistic Denote the set of consonants in the positive class
concept of markedness, which in the context of by C . Similarly, if the absolute value of a nega-
+
phonology tells us that the substantive conditions tive component is less than 15% of MAX− then
that underlie the human capacity of speech pro- assign aneutral class tothecorresponding conso-
ductionandperception renderscertainconsonants nant;elseassignitanegativeclass. Denotetheset
morefavorabletobeincludedintheinventorythan ofconsonants inthenegative classbyC−.
some other consonants (Clements,2008). We ob- (iii) Using the above training set of the clas-
serve that markedness plays avery important role sified consonants (represented as boolean fea-
inshapingtheglobalstructureoftheconsonantin- ture vectors) learn a decision tree (C4.5 algo-
ventories. Infact,ifwearrangetheconsonantsina rithm (Quinlan, 1993)) to determine the features
non-increasing order of the first eigenvector com- that are responsible for the split of the medium
ponents (which is equivalent to increasing order frequency zone into the negative and the positive
of statistical markedness), and compare the set of classes.
consonants present in an inventory of size s with
Fig. 3(a) shows the decision rules learnt from
that of the first s entries from this hierarchy, we
the above training set. It is clear from these rules
find that the two are, on an average, more than
that the split into C− and C+ has taken place
50% similar. This figure is surprisingly high be-
mainly based on whether the consonants have
cause, in spite of the fact that s 541, on an
∀s ≪ 2 the combined “dental alveolar” feature (negative
average s consonants in an inventory are drawn
2 class) or the “dental” and the “alveolar” features
fromthefirstsentriesofthemarkednesshierarchy
separately (positive class). Such a combined fea-
(asmallset),whereastherest s aredrawnfromthe
2 tureisoftentermedambiguousanditspresencein
remaining(541 s)entries(amuchlargerset).
− a particular consonant c of a language l indicates
Thehighdegreeofproportionateco-occurrence
thatthespeakers oflareunabletomakeadistinc-
in PhoNet implied by this high correlation be-
tion as to whether c is articulated with the tongue
tweentheprincipaleigenvectorandfrequencyfur-
against the upper teeth or the alveolar ridge. In
ther indicates that the innate preference towards
contrast,ifthefeaturesarepresentseparately then
certain phonemes is independent of the presence
the speakers are capable of making this distinc-
ofotherphonemesintheinventory ofalanguage.
tion. In fact, through the following experiment,
we find that the consonant inventories of almost
4.3 TheSecondEigenvector ofPhoNet
allthelanguagesinUPSIDgetclassifiedbasedon
Fig.2(e)showsthesecondeigenvectorcomponent
whethertheypreservethisdistinction ornot.
foreachnodeversustheiroccurrencefrequency. It
ExperimentII
isevidentfromthefigurethattheconsonantshave ′ T ′
(i) Construct B = A A – D (i.e., the adjacency
been clustered into three groups. Those that have
matrixofLangGraph).
averyloworaveryhighfrequency clubaround0 ′
(ii) Compute the second eigenvector of B. Once
whereas, the medium frequency zone has clearly
again, the positive and the negative components
split into twoparts. In order to investigate the ba-
splitthelanguagesintotwodistinctgroupsL and
+
sisforthissplitwecarryoutthefollowingexperi-
L− respectively.
ment.
(iii) For each language l L count the num-
+
ExperimentI ∈
ber of consonants in C that occur in l. Sum up
+
(i)Removeallconsonants whosefrequencyofoc-
the counts for all the languages in L and nor-
+
currence acrosstheinventories isverylow(<5).
malize this sum by L C . Similarly, perform
+ +
(ii) Denote the absolute maximum value of the | || |
thesamestepforthepairs(L+,C−),(L−,C+)and
inthespectral analysisof theadjacencymatrixiscompara- (L−,C−).
bletothatofthesmallesteigenvaluesandthecorresponding
eigenvectorsoftheLaplacianmatrix(Chung,1994) Fromtheaboveexperiment,thevaluesobtained
Figure 3: Decision rules obtained from the study of (a) the second, and (b) the third eigenvectors. The
classification errorsforboth(a)and(b)arelessthan15%.
for the pairs (i) (L+,C+), (L+,C−) are 0.35, 0.08 the consonants from C+ and C− in the languages
respectively, and (ii) (L−,C+), (L−,C−) are 0.07, of L−. Therefore, it can be argued that the pres-
0.32 respectively. This immediately implies that enceoftheconsonants fromC− inalanguagecan
almost all the languages in L preserve the den- (phonologically) implythepresence oftheconso-
+
tal/alveolar distinction whilethoseinL− donot. nantsfromC+,butnotviceversa. Wedonotfind
anysuchaforementionedpatternforthefourthand
4.4 TheThirdEigenvector ofPhoNet thehighereigenvector components.
We next investigate the relationship between the
4.5 ControlExperiment
third eigenvector components of B and the occur-
rencefrequencyoftheconsonants (Fig.2(f)). The Asacontrolexperimentwegeneratedasetofran-
consonants are once again found to get clustered dom inventories and carried out the experiments
into three groups, though not as clearly as in the I and II on the adjacency matrix, B , of the ran-
R
previouscase. Therefore,inordertodeterminethe dom version of PhoNet. We construct these in-
basis of the split, we repeat experiments I and II. ventories as follows. Let the frequency of occur-
Fig.3(b)clearlyindicatesthatinthiscasethecon- rence for each consonant c in UPSID be denoted
sonants in C lack the complex features that are byf . Lettherebe317binseachcorresponding to
+ c
considered difficult for articulation. On the other alanguageinUPSID.f binsarethenchosenuni-
c
hand, the consonants in C− are mostly composed formly at random and the consonant c is packed
ofsuchcomplexfeatures. Thevaluesobtained for into these bins. Thus the consonant inventories
the pairs (i) (L+,C+), (L+,C−) are 0.34, 0.06 re- of the 317 languages corresponding to the bins
spectively, and (ii) (L−,C+), (L−,C−) are 0.19, are generated. Note that this method of inventory
0.18 respectively. This implies that while there is constructionleadstoproportionate co-occurrence.
aprevalenceoftheconsonantsfromC inthelan- Consequently, thefirsteigenvector components of
+
guagesofL+,theconsonants fromC− arealmost BR are highly correlated to the occurrence fre-
absent. However, there is an equal prevalence of quency of the consonants. However, the plots of
the second and the third eigenvector components pologies that are predominant in this case con-
versustheoccurrencefrequencyoftheconsonants sist of (a) languages using only those sounds that
indicateabsolutely nopatternthereby, resulting in have simple features (e.g., plosives), and (b) lan-
a large number of decision rules and very high guages using sounds with complex features (e.g.,
classification errors(upto50%). lateral,ejectives,andfricatives)thatautomatically
imply the presence of the sounds having sim-
5 DiscussionandConclusion ple features. The distinction between the simple
andcomplexphonological features isaverycom-
Are there any linguistic inferences that can be
monhypothesis underlying theimplicational hier-
drawn from the results obtained through the
archy and the corresponding typological classifi-
study of the spectral plot and the eigenvectors of
cation (Clements,2008). In this context, Locke
PhoNet? In fact, one can correlate several phono-
and Pearson (1992) remark that “Infants heavily
logical theories to the aforementioned observa-
favor stop consonants over fricatives, and there
tions, which have been construed by the past re-
arelanguages thathavestopsandnofricativesbut
searchers throughveryspecificstudies.
no languages that exemplify the reverse pattern.
Oneofthemostimportantproblemsindefining [Such] ‘phonologically universal’ patterns, which
a feature-based classificatory system is to decide cut across languages and speakers are, in fact, the
when a sound in one language is different from phonetic properties of Homo sapiens.” (as quoted
a similar sound in another language. According in(Valleeetal.,2002)).
to Ladefoged (2005) “two sounds in different
Therefore, it turns out that the methodology
languages should be considered as distinct if we
presented here essentially facilitates the induction
can point to a third language in which the same
of linguistic typologies. Indeed, spectral anal-
twosounds distinguish words”. Thedental versus
ysis derives, in a unified way, the importance
alveolar distinction that we find to be highly in-
of these principles and at the same time quanti-
strumental in splitting the world’s languages into
fies their applicability in explaining the structural
two different groups (i.e., L+ and L− obtained
patterns observed across the inventories. In this
from the analysis of the second eigenvectors of
′ context, there are at least two other novelties of
B and B) also has a strong classificatory basis.
this work. The first novelty is in the systematic
It may well be the case that certain categories of
study of the spectral plots (i.e., the distribution of
sounds like the dental and the alveolar sibilants
the eigenvalues), which is in general rare for lin-
are not sufficiently distinct to constitute a reli-
guistic networks, although there have been quite
able linguistic contrast (see (Ladefoged, 2005)
a number of such studies in the domain of bi-
for reference). Nevertheless, by allowing the
ological and social networks (Farkasetal.,2001;
possibility for the dental versus alveolar distinc-
Gkantsidisetal.,2003; BanerjeeandJost,2007).
tion, one does not increase the complexity or
The second novelty is in the fact that there is
introduce any redundancy in the classificatory
not much work in the complex network literature
system. This is because, such a distinction is
thatinvestigatesthenatureoftheeigenvectorsand
prevalent in many other sounds, some of which
theirinteractions toinfertheorganizingprinciples
are (a) nasals in Tamil (Shanmugam,1972)
ofthesystemrepresented throughthenetwork.
and Malayalam (Shanmugam,1972;
LadefogedandMaddieson, 1996), (b) laterals To summarize, spectral analysis of the com-
in Albanian (Ladefoged andMaddieson, 1996), plex network of speech sounds is able to provide
and (c) stops in certain dialectal variations of a holistic as well as quantitative explanation of
Swahili (Haywardetal.,1989). Therefore, it theorganizing principles ofthesoundinventories.
is sensible to conclude that the two distinct Although this natural mathematical technique has
groups L+ and L− induced by our algorithm are beenheavilyusedinvariousotherdomains,wedo
true representatives of two important linguistic not know of any work that uses spectral analysis
typologies. for induction and understanding of linguistic ty-
The results obtained from the analysis of the pologies. This scheme for typology induction is
′
third eigenvectors of B and B indicate that im- notdependentonthespecificdatasetusedaslong
plicational universals also play a crucial role in as it is representative of the real world. Thus, we
determining linguistic typologies. The two ty- believethattheschemeintroducedherecanbeap-
plied as a generic technique for typological clas- [Ferrer-i-Cancho2005] R. Ferrer-i-Cancho. 2005. The
sifications of phonological, syntactic and seman- structureofsyntacticdependencynetworks:Insights
fromrecentadvancesinnetworktheory. InLevickij
tic networks; each of these are equally interesting
V. and Altmman G., editors, Problems of quantita-
from the perspective of understanding the struc-
tivelinguistics,pages60–75.
tureandevolutionofhumanlanguage,andaretop-
[Gkantsidisetal.2003] C. Gkantsidis, M. Mihail, and
icsoffutureresearch.
E. Zegura. 2003. Spectral analysis of internet
topologies. InINFOCOM’03,pages364–374.
Acknowledgement
[Haywardetal.1989] K.M.Hayward,Y.A.Omar,and
We would like to thank Kalika Bali for her valu- M. Goesche. 1989. Dental and alveolar stops
ableinputstowardsthelinguistic analysis. in Kimvita Swahili: An electropalatographicstudy.
AfricanLanguagesandCultures,2(1):51–72.
[KannanandVempala2008] R. Kannan and
References S. Vempala. 2008. Spectral Al-
gorithms. Course Lecture Notes:
[BanerjeeandJost2007] A.BanerjeeandJ.Jost. 2007.
http://www.cc.gatech.edu/˜vempala/spectral/spectral.pdf.
Spectral plots and the representation and interpre-
tation of biological data. Theory in Biosciences, [LadefogedandMaddieson1996] P. Ladefoged and
126(1):15–21. I. Maddieson. 1996. Sounds of the Worlds
Languages. Oxford:Blackwell.
[BanerjeeandJosttoappear] A.BanerjeeandJ.Jost. to
appear. Graphspectra as a systematic toolin com- [Ladefoged2005] P.Ladefoged. 2005. Featuresandpa-
putationalbiology. DiscreteAppliedMathematics. rametersfordifferentpurposes. InWorking Papers
inPhonetics,volume104,pages1–13.Dept.ofLin-
[BelkinandGoldsmith2002] M. Belkin and J. Gold- guistics,UCLA.
smith. 2002. Using eigenvectors of the bigram
[LindblomandMaddieson1988] B. Lindblom and
graph to infer morpheme identity. In Proceed-
I. Maddieson. 1988. Phoneticuniversalsin conso-
ingsoftheACL-02WorkshoponMorphologicaland
nant systems. In M. Hyman and C. N. Li, editors,
Phonological Learning, pages 41–47. Association
Language,Speech,andMind,pages62–78.
forComputationalLinguistics.
[LockeandPearson1992] J. L. Locke and D. M. Pear-
[Boersma1998] P.Boersma. 1998. FunctionalPhonol-
son. 1992. Vocal learning and the emergence of
ogy. TheHague:HollandAcademicGraphics.
phonologicalcapacity.Aneurobiologicalapproach.
In Phonological development. Models, Research,
[ChoudhuryandMukherjeetoappear] M. Choudhury
Implications,pages91–129.YorkPress.
and A. Mukherjee. to appear. The structure and
dynamics of linguistic networks. In N. Ganguly, [Maddieson1984] I. Maddieson. 1984. Patterns of
A. Deutsch, and A. Mukherjee, editors, Dynamics Sounds. CambridgeUniversityPress.
on and of Complex Networks: Applications to
Biology, Computer Science, Economics, and the [Mukherjeeetal.2007] A. Mukherjee, M. Choudhury,
SocialSciences.Birkhauser. A. Basu, andN. Ganguly. 2007. Modelingthe co-
occurrence principles of the consonant inventories:
[Choudhuryetal.2006] M. Choudhury, A. Mukherjee, A complex network approach. Int. Jour. of Mod.
A. Basu, and N. Ganguly. 2006. Analysis and Phys.C,18(2):281–295.
synthesisofthedistributionofconsonantsoverlan-
[Mukherjeeetal.2008] A. Mukherjee, M. Choudhury,
guages:Acomplexnetworkapproach. InCOLING-
A.Basu,andN.Ganguly. 2008. Modelingthestruc-
ACL’06,pages128–135.
ture and dynamics of the consonantinventories: A
complexnetwork approach. In COLING-08, pages
[Chung1994] F. R. K. Chung. 1994. Spectral Graph
601–608.
Theory. Number2 in CBMS RegionalConference
SeriesinMathematics.AmericanMathematicalSo-
[Quinlan1993] J. R. Quinlan. 1993. C4.5: Programs
ciety.
forMachineLearning. MorganKaufmann.
[Clements2008] G. N. Clements. 2008. The role of [Shanmugam1972] S. V. Shanmugam. 1972. Dental
features in speech sound inventories. In E. Raimy andalveolarnasalsin Dravidian. In Bulletin ofthe
andC. Cairns, editors, ContemporaryViewson Ar- SchoolofOrientalandAfricanStudies,volume35,
chitectureandRepresentationsinPhonologicalThe- pages74–84.UniversityofLondon.
ory.Cambridge,MA:MITPress.
[SigmanandCecchi2002] M. Sigman and G. A. Cec-
[Farkasetal.2001] E. J. Farkas, I. Derenyi, A. -L. chi. 2002. Globalorganizationofthe wordnetlex-
Baraba´si,andT.Vicseck. 2001. Real-worldgraphs: icon. Proceedingsofthe NationalAcademyofSci-
Beyondthesemi-circlelaw. Phy.Rev.E,64:026704. ence,99(3):1742–1747.
[Trubetzkoy1931] N. Trubetzkoy. 1931. Die phonolo-
gischensysteme. TCLP,4:96–116.
[Trubetzkoy1969] N. Trubetzkoy. 1969. Principles of
Phonology. University of California Press, Berke-
ley.
[Valleeetal.2002] N.Vallee,L.J.Boe,J.L.Schwartz,
P. Badin, and C. Abry. 2002. The weight of pho-
neticsubstanceinthestructureofsoundinventories.
ZASPiL,28:145–168.