Promoting academic writing in English for the World Wide Web: Part III [Archives:2005/866/Education]

August 8 2005

Prof R K Jayaraman
Department of English
Faculty of Education
Sana'a University, Sana'a
[email protected]

In Part I of the article, I drew attention to two conventional modes of information interchange that academics normally resort to, in satisfying their academic needs and raised the question of what it means to satisfy such needs in the context of the Web environment. I also considered in some details the problems of electronically simulating academically oriented 'speech'. The article ended with the suggestion that an inter-institutional collaborative project involving interdisciplinary researches, to be carried out in different stages may be a good starting point for progress in the area of Academic Writing in English for the Web.]

The first stage of the project may desirably concentrate on an academically oriented interaction with the Web regarded as a collection of written texts. To this may eventually be added the additional components that effectively handle speech recognition and speech synthesis.

A brief discussion of 'information interchange through written texts' is obviously suggested.

Though in the conventional information interchange settings, people more readily resort to speech than to writing, – because speech is more universal than writing, – in the context of the computer use in general, writing (namely, 'keying in') is far easier to process than speech. Though an oversimplification, for the sake of effect, one may claim that 'writing' is 'speech' minus features like voice quality that are specific to the individual speaker, and, therefore, far less ambiguous. Because of this reason, EAP for the Web in the form of written texts, is far simpler an entity than EAP for the Web in the form of spoken discourse. This explains the uncontrolled proliferation of written documents, available as web pages for us to access and the almost total absence of 'spoken discourses' as 'interactive' web pages. Where available, such web pages 'as speech' can only provide links to other web pages through graphics or 'html' anchors, at present, and not through 'spoken signals' ('hsml' anchors?).

Whether in the form of 'speech' or 'writing', for academic materials in English for the Web, to be regarded as “excellent”, one must pay attention to two things: (a) quality control measures, and (b) accessibility procedures.

(a) Quality control measures:

In the case of academic materials “in print”, (whatever the language), a discerning reader can easily make out which book or journal article is worth reading and which is not. Editorial committees of many different persuasions are constantly at work to weed out rubbish from getting published in internationally reputed journals or as monographs. The quality of a book is often guaranteed by the name that a publisher has. On the other hand, apart from the “firewall”, which has non-academic reasons for filtering out sensitive materials, there is hardly anything worth its name to control the quality of the academic materials appearing on the Web. There is of course a popularity rating for many of the sites, which is determined by the number of visits that have been made to the site by the users. But a popularity rating is not the same as a quality rating. But this is not something that is difficult to achieve. Experts from different academic departments of institutions of higher learning from different parts of the globe can form editorial committees under the auspices of any reputed publishing agency, review the materials (including their links), and assign a suitable quality index to them.

Again, when it comes to writing for a journal or publishing a book on any academic topic, the writer is often guided by well-known style manuals, like the MLA Style Sheet, or the Chicago Manual of Style. In the case of the Web, there is an urgent need to create a set of guidelines for the academically oriented writings in English for the Web. The academic departments of various universities may take the lead in this regard, and lay down the guidelines for the use of the technical terms in their respective fields. This may mean creating an Augmented Transition Network of all the technical terms relating to the discipline, demonstrating both the semantic and logical relationships that obtain among the terms listed, showing how the discipline itself relates with other disciplines, and so on, so that it can feed into a larger “Knowledge Representation System” designed specifically for academic purposes. The moment the Internet detects an intention on the part of someone to create a web page, it can prompt him to use something like “an Internet Style Manual”. This, of course, is a matter of detail and need not be gone into any further.

(b) Accessibility procedures:

In the first place, accessing a web site or a web page is different from using links within a page to move to other sections of the same page. A university homepage is a good example of the second kind mentioned. An academic will have need to do both. Early on in the paper, keyword-based searches were shown to be inadequate for academic purposes; a keyword is a unique index that does not recognize meaning as 'substance'; it does not recognize the social, political, moral, ethical, and emotional dimensions of a term keyed in by the user. For a search to be cost-effective, relevant, and to have high quality, a search engine should go far beyond a 'keyword-based' search, and be made 'intelligent'. An 'intelligent' search engine makes a rational decision when prompted to select 'matches' for a given 'term' from among a huge list of web pages which would otherwise qualify as 'matches', from the point of view of a 'keyword-based' search. In other words, it knows its user, it keeps track of his/ her academic interests, and knows what exactly he/ she needs at a particular point of his/ her pursuit, keep all the 'false' candidates at bay, and thereby makes the search cost-effective, and relevant and ensures speed of access.

What does an engine of this dimension look like? To answer this question, one enters an area where it is difficult to resist the temptation to become technical. But keeping the interests of an EAP audience in view, I would like to avoid, if you will pardon the expression, the 'nitty-gritty' of information technology and keep to a reasonably broad statement of the issues involved.

1. To start with the engine should successfully avoid what is called 'user disorientation'. The term may be best illustrated by means of an anecdote. Ali brings Adel, a blind man, the news of Ameen's death. The following conversation takes place:

Adel: That's terrible! Well, how did he die?

Ali: He choked while drinking milk.

Adel: What is milk?

Ali: Milk is something while in colour.

Adel: What do I, a blind man, know about colours? What is white, tell me.

Ali: Well, white is the colour of a heron.

Adel: You are testing my patience. What is a heron?

Ali: Well, heron is a water bird, which looks somewhat like my hand bent like this . Can you feel it and see?

Ali bends his hand in the shape of a bird and guides Adel to feel it. After doing so, Adel exclaims:

Adel: No wonder the fellow got choked. Anyway, what made him drink something so big and crooked?

The anecdote is of course unreal and illogical, but it makes a point. It demonstrates a failure on the part of the interlocutors to sustain the theme of their conversation. Something comparable to this often happens when one browses the Internet, and before long one feels totally lost among a barrage of irrelevant hits. A failure to sustain the theme would yield quite interesting results if one were browsing the Internet for distraction, but very often it can be frustrating where the search is academically motivated. [Look at, for example, the search results that I obtained through Yahoo and MSN search engines for “Government and Binding Theory of Syntax”.] (transparencies)

In order to avoid 'user disorientation', the engine must develop a 'user profile', which means the same as keeping track of his/ her interests and maintaining a record of all the web pages that he/ she visits, associating his / her identity with the resulting record and, on the basis of this, understand his/ her continually changing 'schema'. In cases where the engine has found a large number of matches for the search term entered, but is not sure which of them are likely to be more relevant than which others, even after applying its knowledge of the user's schema in narrowing down the matches, it may resort to an interactive questioning based search. That is, by seeking further clarifications about the 'semantic load' of the 'search term' keyed in. 'Flagging' is a technique commonly used for developing a user profile.

2. Let's examine, in the next place, how a search engine is related to what is called 'a knowledge representation system' and what additional features are needed to be incorporated into such a system, if the search engine were to function 'intelligently'. A knowledge representation system is a network of semantically related terms, a way of organizing 'content', the search engine's schema, in a manner of speaking. The web pages constituting the search space are linked to this network, for optimal output, and in view of this, it may be said to operate midway between the user and the searched-for-hits. Such a network for EAP will require millions of terms, to be involved in an enormous number of semantic as well as logical relations with each other. For the search engine to be 'intelligent', the knowledge representation system on which it is mounted should include an inference engine, in addition to an exhaustive representation of concepts in an extremely finely tuned network of semantic and logical relations. The inference engine to be used with the knowledge representation system is intended to deduce the exact field of knowledge to which the user's search term belongs. The moment a search term is routed through it, it should be able to match it, not just against an identical term in the repertoire of terms with in the Knowledge representation system, but against an entire network of relations that the term enters into. This is necessary for it to identify the user's area of academic interest appropriately.

3. Thirdly, we know that an academic not only searches an existing body of knowledge, but also contributes to it. When a new concept is floated, a new theory is formulated, or a hitherto-taken-for-granted paradigm is questioned, the 'knowledge representation system' on which the search engine is mounted may require to be altered, if not thoroughly revamped. In other words, a knowledge representation system should continually change to reflect the currency of everyday language. This will only be possible when the system is able to ask the user questions, every now and then, to clarify its own doubts and there by learn more about his topic of interest, and accordingly serve him with the information that he/ she requires. It must be programmed to learn through its interactions with the users, and this is the main concern of 'machine learning', a branch of Artificial Intelligence.

4. And then there are cases where the user is not able to clearly spell out his query. It is possible that he/ she only has partial knowledge of the subject matter about which they seek more information. This is where the search engine turns to 'default reasoning', fuzzy logic, or a probabilistic model of reasoning to make bold guesses about the user's needs.

5. Most important of all, the search engine should have recourse to what is sometimes known as 'exclusion handling'. Exclusion handling is a matter of deciding, fairly early on in the search path, 'what not to choose' as against 'what to choose', so that the search space for the term entered is drastically narrowed down at the very beginning of the search and the other procedures are then able to work more effectively.

In sum, we need a search engine that will make a rational choice of matches / hits from those that are available, taking into account, the user's schema, his / her purpose, and specific needs, social, political, emotional, or whatever. It should not waste his / her time by providing him with irrelevant hits. If a particular web page is considered to be very relevant to a user's purpose, the search engine must have a way of identifying this, even if it does not contain any of the search terms specified by the user.

The implication of all this for academic writings in English for the Web is enormous. At a very broad level, two sets of actions to be implemented concurrently are suggested: There is a colossal body of existing literatures on various academic disciplines which urgently require to be converted into a machine readable format, so that one may get ready in time to usher in an era of the inevitable 'paperless, printless societies'. Secondly, an elaborate set of very meticulously worked out conventions for prospective writings in English for academic purposes need to be developed. According to a recent study, for example, people searching the web for huge amounts of information soon begin to suffer from visual fatigue and grow impatient for summaries. This implies that structural changes may have to be brought about in the way people write for the Web. Lengthy preambles will soon be out of fashion and every section of an article may have to begin with the conclusion first and give details of the reasoning processes that yield the conclusion later. Again, information may have to be so laid out as to promote scanning rather than intensive reading. Conventions like these and many more of them need to be widely circulated among institutions of higher learning in different parts of the world, so that EAP on the Web becomes standardized. This speaker is aware that he is not the first to make a recommendation of this nature. But the recommendation can bear repetition.

Finally, a few disturbing questions: Where will all this take us ? Do we want to pursue this effort to its logical conclusion? Isn't there a point at which the machine stops and the man takes over? And the audience may add their questions to the list.