Using Corpora to Analyze Gender

The addition of this element allows XML-encoded text of any degree of complexity to be integrated within the heavily structured metadata architecture of PML in a manner analogous to the integrated text and metadata used in many e-text projects.

Chapter 3. Nope, good guess though: How do female academics signal disagreement?

Integrating the text in this way is, of course, not strictly necessary to perform the type of analysis discussed in this article: the linkages established by the sourceRef attribute would allow the disparate corpora of PML and text files readily to be treated as a single entity. Applying sentiment analysis to the PML corpus To test the feasibility of using PML as an analytical tool, simple sentiment analysis was carried out on a portion of the corpus that covered a single topic, attitudes to immigration.

Like all sentiment analysis, this attempted to discern the perspective or attitude of the speaker but was limited to a basic categorization of their expressed views in terms of polarity, whether they were positive, negative or neutral. It did not try to discern more complicated sentiments such as emotional moods or states. It can rapidly process multiple files to identify subjective linguistic elements which are then automatically tagged using inline SGML elements [ Wilson et al.

A paragraph element from a PML file tagged in this way might appear in this slightly simplified form:. For the illustrative purposes of this article, these tags were used to generate a basic measure of the degree of positive or negative sentiment per paragraph by counting the relative numbers of positive, neutral and negative sentiments marked. Clearly more sophisticated techniques are available for analysing output of this type, but these simple measures are sufficient to illustrate the capabilities of PML for contextualising these results.

Using Corpora to Analyze Gender

Using the PML architecture to contextualise sentiment analysis data A number of ways in which this can be done will illustrate something of its potential for enriching the output of linguistic analyses. One obvious line of enquiry is to examine the sentiments expressed by individual MPs. This is readily done by using the contributorID attribute shown in Example 2 which links a contribution to the person making it. In this chart the colour black indicates the most negative attitudes, reddish brown those which tend to be more neutral and orange the most positive.

Although this example shows only a very simple analysis, it does reveal some interesting results. The then Home Secretary, Theresa May, tended initially to express relatively neutral sentiments towards immigration but these grew significantly more negative as the parliamentary term progressed. This simple analysis reveals that her negative views on immigration formed over a number of years. Another level of analysis may be carried out by the same simple statistical technique and colour coding to visualise patterns of sentiment by political party.

This analysis, which uses the same colour coding as Figure 1, shows the possibly surprising result that there is little difference between the parties in their predominantly negative sentiments expressed towards immigration during the first year of this Parliament. The left-of-centre Labour party appears as negative as the right-wing Conservative party.

Reading Lists

This may reflect the beginnings of an initiative by the then Labour leader to attempt to make his party appear tougher on immigration, which some at the time felt was an issue that had lost them votes in the General Election of , and which later led to more pronounced comments and policies on this subject [ Miliband ]. A further level of analysis could extend this breakdown by party to incorporate gender.

This is readily done as the same categoryID mentioned above may be used to delineate by gender. More sophisticated analyses may be achieved by utilising a further feature of PML, its extensive use of URIs to reference almost every component: these may readily be used to interface with externally-held data. One interesting approach to examining patterns of sentiment expressed in the proceedings is to consider their geo-spatial dimension. Each constituency recorded in a PML file, for instance, may be labelled with a URI which in turn can be used to generate geo-data to enable visualisations demonstrating the geographical patterning of the sentiments recorded in parliamentary speeches.

One approach to enabling this is to use files encoded in Keyhole Markup Language KML [ Google Developers ] which have already been published to cover UK parliamentary constituencies. For the purpose of this demonstration, the seventy-two files representing constituencies for Greater London were used.

About Paul Baker

Each KML file is run through an XSLT transformation or XQUERY query which retrieves contributions by the member for each constituency, calculates the relative weights of the positive, negative and neutral values in the sentiments expressed within these, assigns a score from 1 to based on these sentiments and generates a colour code based on this value.

This value then replaces the one already present in the KML element that designates the fill colour for the polygon defining the boundaries of each constituency. This process may be used across the whole corpus or be restricted by date, down to a fine level of granularity if required.

How to Use Corpora in Language Teaching

The resulting set of KML files may be used to generate colour-coded maps using any compliant software. As may be expected in the most diverse and multi-cultural city in the United Kingdom, most sentiments expressed towards immigration by London MPs tend towards the positive, the only exception being those from the member for Mitcham and Morden.

This is, of course, a visualisation of data from a single year, the year of the General Election: it would be relatively simple to generate animated visualisations representing any changing sentiments within this geographic area over time.

Comparison with other approaches to parliamentary metadata PML's rich set of semantic linkages for parliamentary metadata offers a potentially more powerful base for machine-readable analysis than the other approaches detailed above. At this point, it is useful to compare how its approach differs from these.

About Using Corpora to Analyze Gender

