Adriana Ladislau - English to Portuguese translator. Translation services in Business/Commerce (general)

Working languages:

English to Portuguese
Portuguese to English

Adriana Ladislau
Legal English

Belo Horizonte, Minas Gerais, Brazil

Local time: 00:14 -03 (GMT-3)

Native in: Portuguese

Send email

Feedback from
clients and colleagues
on Willingness to Work Again

No feedback collected

Adriana Ladislau

Freelance translator and/or interpreter

This person has a SecurePRO™ card. Because this person is not a ProZ.com Plus subscriber, to view his or her SecurePRO™ card you must be a ProZ.com Business member or Plus subscriber.

This person is not affiliated with any business or Blue Board record at ProZ.com.

Translation, Editing/proofreading

Specializes in:
Law: Contract(s)	Business/Commerce (general)

English to Portuguese - Standard rate: 0.06 USD per word / 48 USD per hour
Portuguese to English - Standard rate: 0.06 USD per word / 48 USD per hour

Sample translations submitted: 1

English to Portuguese: Multidimensional Analysis Tagger
Source text - English Multidimensional Analysis Tagger (v. 1.3) – Manual The Multidimensional Analysis Tagger (MAT) is a program for Windows that replicates Biber's (1988) tagger for the multidimensional functional analysis of English texts, generally applied for studies on text type or genre variation. The program generates a grammatically annotated version of the corpus or text selected as well as the statistics needed to perform a text-type or genre analysis. The program plots the input text or corpus on Biber’s (1988) Dimensions and it determines its closest text type, as proposed by Biber (1989). Finally, the program offers a tool for visualising the Dimensions features of an input text. A summary of Biber’s Dimensions and text types is provided below. This is an implementation of the tagger used in Biber (1988) and in many other works. This tagger tries to replicate the analysis in Biber (1988) as closely as possible by taking into account the algorithms that the author presented in the Appendix of the book. The basic analysis of the text is done through the Stanford Tagger. The present tagger includes a copy of the Stanford Tagger (2013) which is run automatically to produce a preliminary grammatical analysis. MAT then expands the Stanford Tagger tag set by identifying the linguistic features used in Biber (1988). This document includes an extensive description of the tagger as well as some instructions for the user. Referencing the tagger To reference the tagger, please use the following: Nini, A. 2015. Multidimensional Analysis Tagger (Version 1.3). Available at: http://sites.google.com/site/multidimensionaltagger This program is based on the Stanford Tagger and it is therefore necessary to reference the Stanford Tagger any time the program is used. To reference the Stanford Tagger, please refer to the Stanford Tagger website: http://nlp.stanford.edu/software/tagger.shtml. 1 Architecture of the program Requirements: the program requires Java to run. This can be downloaded from http://java.com/en/download/index.jsp Tagger This module of the program accepts as input only plain text files in the format ‘.txt’. The user can select either a folder of .txt files or a single .txt file. It is also possible to simply drag and drop a file or folder to the button. MAT tagger uses the Stanford Tagger for an initial segmentation in parts of speech and then finds the patterns described in Biber (1988). Some basic Stanford Tagger tags are replaced by new tags that are more specific. For example, negations and prepositions are distinguished, respectively, from general adverbs and general subordinators. The word to used as an infinitive marker is disambiguated from the word to used as a preposition. Three tags are added in order to facilitate the identification of Biber’s (1988) linguistic features, these are: (1) indefinite pronouns (INPR): anybody, anyone, anything, everybody, everyone, everything, nobody, none, nothing, nowhere, somebody, someone, something; (2) quantifiers (QUAN): each, all, every, many, much, few, several, some, any; (3) quantifier pronouns (QUPR): everybody, somebody, anybody, everyone, someone, anyone, everything, something, anything. A full list of tags and a description of the algorithms used to find them is given below. The Stanford tagged texts will appear in a folder called ‘ST_name_of_folder’ or ‘ST_name_of_file’. The MAT tagged texts will appear in a folder called ‘MAT_name_of_folder’ or ‘MAT_name_of_text’. Both folders will be created in the folder selected for the analysis. When the tagger is launched, a module of the tagger will check the encoding of the .txt files selected. The tagger will then flag any text in UNICODE and it is up to the user to change this to a compatible format, such as ANSI or UTF-8. After this stage, the tagger will scan each of the .txt files in order to find instances of curly inverted commas. This step is necessary as otherwise some contractions are not tagged properly. If the tagger finds any instance of curly commas it will replace them with standard commas. This will overwrite the file, so the original .txt file with the curly commas will be lost. If it is necessary to keep the original with curly commas then it is recommended to create a backup copy before running MAT. 2 Analyser This module of the program can be called either via the ‘Analyse’ button or via the ‘Tag and Analyse’ button. It is also possible to simply drag and drop a file or folder to the button. When this module starts, the user will be asked to input the number of tokens for which the type-token ratio should be calculated (for details see the entry on type-token ratio in the list of variables). By default, this number is 400, as set in Biber (1988). The user will then asked to choose which Dimensions to display graphically. The result of the analysis consists of a number of output files that will be created in a folder called ‘Statistics’ contained in the same folder that contains the MAT tagged texts. These files are: 1) ‘Corpus_Statistics.txt’: a tab delimited file that shows the frequency per 100 tokens for all the linguistic variables (see below) found in the input text or corpus. If the user selects the option ‘all tags’, then this file will display the counts for all the tags in the text, including the punctuation items. On the other hand, if the user selects the option ‘only VASW tags’, then only the tags used in Biber (1988) will be displayed. 2) ‘Zscores.txt’: a tab delimited file that includes the z-scores of the linguistic variables for the input file or corpus. If the user has selected a folder of text files as input, then the averages for the corpus are showed. The z-scores are calculated on the basis of the means and standard deviations presented in Biber (1988: 77). For each text and for the corpus as a whole, the program will flag all the z-scores with a magnitude higher than 2 as ‘Interesting variables’. The z-scores displayed in this file are not affected by the user’s selection of the z-score correction. The option ‘z-score correction’ affects only the calculation of the Dimension scores. 3) ‘Dimensions.txt’: a tab delimited file that contains the scores for the Dimensions as well as the averages for the corpus, if the user has selected a folder of text files. The Dimension scores are calculated using the z-scores of the variables that presented a mean higher than 1 in the chart presented in Biber (1988: 77). The reliability of the Dimension scores produced by MAT was checked against the LOB and the Brown corpus. The results of the tests are presented below. The program classifies each text according to its closer text type as proposed by Biber (1989) using Euclidean distance. If the user has selected as input a folder of texts, then the averages for the corpus are provided. If the user has chosen to use the z- score correction, then these Dimension score reflect the choice. When the user 3 selects to use the z-score correction, all the z-scores used to calculate the Dimension scores are first checked for their magnitude. If the absolute value of the magnitude is higher than 5, the program will change it to 5. This correction avoids the problem of few infrequent variables affecting the overall Dimension scores. This option should be used with caution and it is particularly advised only for very short texts. 4) ‘Dimension#.png’: a graph that displays the location of the input text’s Dimension score compared to a number of genres as shown in Biber (1988: 172). The graph displays the mean and the range for each genre. If the user has selected as input only one text, then the Dimension score for that text is shown. On the other hand, if the user has selected a corpus as input, then the mean and the range for that corpus are displayed. The program will print the closest genre to the user’s text or corpus next to the title of the graph. MAT produces as many Dimension graphs as the user has selected. 5) ‘Text_types.png’: a graph representing the location of the analysed text or corpus in relation to Biber's (1989) eight text types. The program will print the closest text type to the user’s text or corpus next to the title of the graph. Text types are assigned using Euclidean distance. Inspect tool This tool allows the user to display the Dimension features of a single text. It is also possible to simply drag and drop a MAT file to the button for the function to start. The user can choose which Dimensions to visualise. Once the tool is used, a new file named ‘FILENAME_features.html’ will be created in the folder where the selected text is located. This tool can be used only with MAT tagged texts. 4 A summary of Biber’s (1988) Dimensions Dimension Description 1 Dimension 1 is the opposition between Involved and Informational discourse. Low scores on this variable indicate that the text is informationally dense, as for example academic prose, whereas high scores indicate that the text is affective and interactional, as for example a casual conversation. A high score on this Dimension means that the text presents many verbs and pronouns (among other features) whereas a low score on this Dimension means that the text presents many nouns, long words and adjectives (among other features). 2 Dimension 2 is the opposition between Narrative and Non-Narrative Concerns. Low scores on this variable indicate that the text is non-narrative whereas high scores indicate that the text is narrative, as for example a novel. A high score on this Dimension means that the text presents many past tenses and third person pronouns (among other features). 3 Dimension 3 is the opposition between Context-Independent Discourse and Context- Dependent Discourse. Low scores on this variable indicate that the text is dependent on the context, as in the case of a sport broadcast, whereas a high score indicate that the text is not dependent on the context, as for example academic prose. A high score on this Dimension means that the text presents many nominalizations (among other features) whereas a low score on this Dimension means that the text presents many adverbs (among other features). 4 Dimension 4 measures Overt Expression of Persuasion. High scores on this variable indicate that the text explicitly marks the author’s point of view as well as their assessment of likelihood and/or certainty, as for example in professional letters. A high score on this Dimension means that the text presents many modal verbs (among other features). 5 Dimension 5 is the opposition between Abstract and Non-Abstract Information. High scores on this variable indicate that the text provides information in a technical, abstract and formal way, as for example in scientific discourse. A high score on this Dimension means that the text presents many passive clauses and conjuncts (among other features). 5 6 Dimension 6 measures On-line Informational Elaboration. High scores on this variable indicate that the text is informational in nature but produced under certain time constraints, as for example in speeches. A high score on this Dimension means that the text presents many postmodifications of noun phrases (among other features). 6 A summary of Biber’s (1989) text types Text type Description Intimate Interpersonal Interaction Characterising Genres Characterising Dimensions Texts belonging to this text type are typically interactions that have an interpersonal concern and that happen between close acquaintances Informational Interaction telephone conversations high score on D1, between personal friends low score on D3, low score on D5, unmarked scores for the other Dimensions Texts belonging to this text type are typically personal spoken interactions that are focused on informational concerns Scientific Exposition face-to-face interactions, high score on D1, telephone conversations, low score on D3, spontaneous speeches, low score on D5, personal letters unmarked scores for the other Dimensions Texts belonging to this text type are typically informational expositions that are formal and focused on conveying information and very technical Learned Exposition academic prose, official low score on D1, documents high score on D3, high score on D5, unmarked scores for the other Dimensions Texts belonging to this text type are typically informational expositions that are formal and focused on conveying information Imaginative Narrative official documents, press low score on D1, reviews, academic prose high score on D3, high score on D5, unmarked scores for the other Dimensions Texts belonging to this text type are typically texts that present an extreme narrative concern General Narrative romance fiction, general high score on D2, fiction, prepared low score on D3, speeches unmarked scores for the other Dimensions press reportage, press low score on D1, Texts belonging to this 7 Exposition editorials, biographies, non-sports broadcasts, science fiction text type are typically texts that use narration to convey information Situated Reportage sports broadcasts low score on D3, low score on D4, unmarked scores for the other Dimensions Texts belonging to this text type are typically on-line commentaries of events that are in progress Involved Persuasion spontaneous speeches, professional letters, interviews high score on D4, unmarked scores for the other Dimensions Texts belonging to this text type are typically persuasive and/or argumentative 8 high score on D2, unmarked scores for the other Dimensions Reliability tests for the program The program was tested for reliability on the LOB and on the Brown corpus. These results are reproduced below. 9 Table 1 – MAT analysis of the LOB corpus compared to Biber’s (1988) results D1 D2 D3 D4 D5 D6 Press reportage - MAT -14.02 0.97 2.81 -0.38 0.52 -0.72 59% General narrative exposition; 39% Learned exposition; 2% Involved persuasion; 2% Scientific exposition Press reportage - Biber (1988) -15.01 0.4 -0.3 -0.7 0.6 -0.9 73% General narrative exposition; 25% Learned exposition; 2% Scientific exposition Difference 0.99 0.57 3.11 0.32 0.08 0.18 Press editorials - MAT -8.4 -0.28 4.38 3.3 1.5 0.33 81% General narrative exposition; 7% Involved persuasion; 7% Scientific exposition; 4% Learned exposition Press editorials - Biber (1988) -10 -0.8 1.9 3.1 0.3 1.5 86% General narrative exposition; 11% Involved persuasion; 4% Learned exposition Difference 1.6 0.52 2.48 0.2 1.2 1.17 Press reviews - MAT -12.45 -0.74 5.38 -2.32 0.36 -1.01 53% General narrative exposition; 47% Learned exposition Press reviews - Biber (1988) -13.9 -1.6 4.3 -2.8 0.8 -1 47% Learned exposition; 47% General narrative exposition; 6% Scientific exposition Difference 1.45 0.86 1.08 0.48 0.44 0.01 Religion - MAT -4.26 0.17 4.69 0.85 2.22 1.01 65% General narrative exposition; 29% Involved persuasion; 6% Scientific exposition Religion - Biber (1988) -7 -0.7 3.7 0.2 1.4 1 59% General narrative exposition; 18% Involved persuasion; 18% Learned exposition; 6% Imaginative narrative Difference 2.74 0.87 0.99 0.65 0.82 0.01 Hobbies - MAT -9.42 -2.1 3.15 1.51 2.54 -0.35 34% 18% General narrative exposition; 24% Learned exposition; 24% Involved persuasion; Scientific exposition Hobbies - Biber (1988) -10.1 -2.9 0.3 1.7 1.2 -0.7 43% General narrative exposition; 21% Learned exposition; 21% Involved persuasion; 7% Scientific exposition; 7% Situated reportage Difference 0.68 0.8 2.85 0.19 1.34 0.35 Popular lore - MAT -9.58 0.31 3.42 -0.61 1.4 -0.64 36% Learned exposition; 32% General narrative exposition; 20% Involved persuasion; 2% Imaginative narrative; 9% Scientific exposition Popular lore - Biber (1988) -9.3 -0.1 2.3 -0.3 0.1 -0.8 36% Learned exposition; 36% Involved persuasion; 21% General narrative exposition; 7% Imaginative narrative Difference 0.28 0.41 1.12 0.31 1.3 0.16 10 56% Academic prose - MAT -12.16 -2.16 5.38 -0.02 5.14 0.23 Scientific exposition; 24% Learned exposition; 14% General narrative exposition; 6% Involved persuasion Academic prose - Biber (1988) -14.09 -2.6 4.2 -0.5 5.5 0.5 44% Scientific exposition; 31% Learned exposition; 17% General narrative exposition; 9% Involved persuasion Difference 1.93 0.44 1.18 0.48 0.36 0.27 General fiction - MAT 0.35 6.26 0.03 1.79 -0.45 -0.75 55% Imaginative narrative; 31% General narrative exposition; 10% Involved persuasion; 3% Learned exposition General fiction - Biber (1988) -0.8 5.9 -3.1 0.9 -2.5 -1.6 51% Imaginative narrative; 41% General narrative exposition; 3% Informational interaction; 3% Involved persuasion Difference 1.15 0.36 3.13 0.89 2.05 0.85 Mystery fiction - MAT 0.82 5.76 -0.7 1.55 -0.69 -1.13 67% Imaginative narrative; 29% General narrative exposition; 4% Involved persuasion Mystery fiction - Biber (1988) -0.2 6 -3.6 -0.7 -2.8 -1.9 70% Imaginative narrative; 23% General narrative exposition; 8% Situated reportage Difference 1.02 0.24 2.9 2.25 2.11 0.77 Science fiction - MAT -5.01 6.1 1.08 0.21 -0.54 -0.54 83% General narrative exposition; 17% Imaginative narrative Science fiction - Biber (1988) -6.1 5.9 -1.4 -0.7 -2.5 -1.6 50% General narrative exposition; 33% Imaginative narrative; 17% Situated reportage Difference 1.09 0.2 2.48 0.91 1.96 1.06 Adventure fiction - MAT -0.85 5.89 -1.29 0.19 -0.97 -1.29 69% Imaginative narrative; 24% General narrative exposition; 3% Involved persuasion; 3% Learned exposition Adventure fiction - Biber (1988) 0 5.5 -3.8 -1.2 -2.5 -1.9 70% Imaginative narrative; 31% General narrative exposition Difference 0.85 0.39 2.51 1.39 1.53 0.61 Romantic fiction - MAT 3.55 6.71 -0.88 2.35 -1.26 -1 79% Imaginative narrative; 17% General narrative exposition; 3% Involved persuasion Romantic fiction - Biber (1988) 4.3 7.2 -4.1 1.8 -3.1 -1.2 92% Imaginative narrative; 8% General narrative exposition Difference 0.75 0.49 3.22 0.55 1.84 0.2 11 Humour - MAT -6.19 1.43 1.62 0.43 0.65 -0.56 78% General narrative exposition; 11% Imaginative narrative; 11% Involved persuasion Humour - Biber (1988) -7.8 0.9 -0.8 -0.3 -0.4 -1.5 89% General narrative exposition; 11% Involved persuasion Difference 1.61 0.53 2.42 0.73 1.05 0.94 12 The scores obtained by MAT for the Dimensions show that MAT is largely successful in replicating Biber’s (1988) analysis. For Dimension 1, the difference ranges from a minimum of 0.28 for Popular Lore to a maximum of 2.74 for Religion. However, given the wide span of Dimension 1 scores, even a difference of 3 does still correctly locate the text in the right area of Dimension 1. For Dimension 2, the difference ranges from a minimum of 0.2 for Science Fiction to a maximum of 0.87 for Religion. This difference of less than a point is not enough to cause any significant difference in terms of text type assignation and/or location of the analysed text(s) along Dimension 2. For Dimension 3, the difference ranges from a minimum of 0.99 for Religion to a maximum of 3.22 for Romantic Fiction. Given the limited range of Dimension 3, differences of magnitude 2 or more can create some problems in the reliability of MAT Dimension 3 scores. For Dimension 4, the differences range from a minimum of 0.19 for Hobbies to a maximum of 2.25 for Mystery Fiction. Apart from this value, all other values show that there are no large differences between Biber’s (1988) scores and MAT’s. For Dimension 5, the differences range from a minimum of 0.08 for Press Reportage to a maximum of 2.11 for Mystery Fiction. Apart from this value, all other values show that there are no large differences between Biber’s (1988) scores and MAT’s. Finally, for Dimension 6, the differences range from a minimum of 0.01 for Press Reviews and Religion to a maximum of 1.06 for Science Fiction, confirming that there are no large differences between Biber’s (1988) scores and MAT’s analysis. In general, therefore, it is possible to conclude that MAT performs well in replicating Biber’s (1988) study. The only anomalous scores are the ones obtained for Dimension 3. An exploration of the z-scores pointed out that the scores produced by MAT for Dimension 3 are inflated because of high z-scores of general adverbs. However, to this stage no cause was individuated as being responsible for this variation. Until the problem is resolved, Dimension 3 scores produced by MAT should be treated with caution. Although the differences for Dimension 3 are moderate, these do not influence the assignation of the text type in many cases, since most of the genres are unmarked for Dimension 3. The assignation of text types given by MAT are generally accurate with some small inaccuracies probably caused by the small differences between the dictionaries or rules employed by Stanford Tagger and the tagger used in Biber (1988). Another test was run for the Brown corpus and the results are presented below. 13 Table 2 - MAT analysis of the Brown corpus compared to Biber’s (1988) results D1 D2 D3 D4 D5 D6 Press reportage - MAT -17.61 0.09 4.51 -1.55 0.85 -1.11 75% Learned exposition; 20% General narrative exposition; 4% Scientific exposition Press reportage - Biber (1988) -15.01 0.4 -0.3 -0.7 0.6 -0.9 73% General narrative exposition; 25% Learned exposition; 2% Scientific exposition Difference 2.6 0.31 4.81 0.85 0.25 0.21 Press editorials - MAT -10.71 -0.59 4.5 1.39 0.63 -0.28 63% General narrative exposition; 7% Involved persuasion; 26% Learned exposition; 4% Scientific exposition Press editorials - Biber (1988) -10 -0.8 1.9 3.1 0.3 1.5 86% General narrative exposition; 11% Involved persuasion; 4% Learned exposition Difference 0.71 0.21 2.6 1.71 0.33 1.78 Press reviews - MAT -13.83 -1.32 5.27 -3.31 0.41 -1.08 59% Learned exposition; 41% General narrative exposition Press reviews - Biber (1988) -13.9 -1.6 4.3 -2.8 0.8 -1 47% Learned exposition; 47% General narrative exposition; 6% Scientific exposition Difference 0.07 0.28 0.97 0.51 0.39 0.08 Religion - MAT -7.17 -0.11 5.1 0.39 2.11 0.49 35% General narrative exposition; 29% Involved persuasion; 24% Learned exposition; 12% Scientific exposition Religion - Biber (1988) -7 -0.7 3.7 0.2 1.4 1 59% General narrative exposition; 18% Involved persuasion; 18% Learned exposition; 6% Imaginative narrative Difference 0.17 0.59 1.4 0.19 0.71 0.51 Hobbies - MAT -12.44 -2.66 4.47 -0.86 1.34 -1.15 50% Learned exposition; 36% General narrative exposition; 6% Involved persuasion; 8% Scientific exposition Hobbies - Biber (1988) -10.1 -2.9 0.3 1.7 1.2 -0.7 43% General narrative exposition; 21% Learned exposition; 21% Involved persuasion; 7% Scientific exposition; 7% Situated reportage Difference 2.34 0.24 4.17 2.56 0.14 0.45 Popular lore - MAT -13.3 -0.1 3.9 -1.03 1.38 -0.67 44% Learned exposition; 42% General narrative exposition; 8% Involved persuasion; 6% Scientific exposition Popular lore - Biber (1988) -9.3 -0.1 2.3 -0.3 0.1 -0.8 36% Learned exposition; 36% Involved persuasion; 21% General narrative exposition; 7% Imaginative narrative Difference 4 0 1.6 0.73 1.28 0.13 14 Academic prose - MAT -13.58 -2.33 5.93 -0.88 4.48 0.01 38% Scientific exposition; 38% Learned exposition; 23% General narrative exposition; 3% Involved persuasion Academic prose - Biber (1988) -14.09 -2.6 4.2 -0.5 5.5 0.5 44% Scientific exposition; 31% Learned exposition; 17% General narrative exposition; 9% Involved persuasion Difference 0.51 0.27 1.73 0.38 1.02 0.49 General fiction - MAT -5.83 5.86 0.19 -0.33 -0.44 -1.22 66% General narrative exposition; 24% Imaginative narrative; 10% Involved persuasion General fiction - Biber (1988) -0.8 5.9 -3.1 0.9 -2.5 -1.6 51% Imaginative narrative; 41% General narrative exposition; 3% Informational interaction; 3% Involved persuasion Difference 5.03 0.04 3.29 1.23 2.06 0.38 Mystery fiction - MAT -2.21 5.57 -1.22 0.13 -1.03 -1 46% General narrative exposition; 42% Imaginative narrative; 13% Involved persuasion Mystery fiction - Biber (1988) -0.2 6 -3.6 -0.7 -2.8 -1.9 70% Imaginative narrative; 23% General narrative exposition; 8% Situated reportage Difference 2.01 0.43 2.38 0.83 1.77 0.9 Science fiction - MAT -4.1 4.79 1.3 0.12 0.79 -0.78 50% General narrative exposition; 17% Imaginative narrative; 17% Involved persuasion; 17% Learned exposition Science fiction - Biber (1988) -6.1 5.9 -1.4 -0.7 -2.5 -1.6 50% General narrative exposition; 33% Imaginative narrative; 17% Situated reportage Difference 2 1.11 2.7 0.82 3.29 0.82 Adventure fiction - MAT -6.05 5.88 -0.81 -1.78 -1.05 -1.39 66% General narrative exposition; 31% Imaginative narrative; 3% Learned exposition Adventure fiction - Biber (1988) 0 5.5 -3.8 -1.2 -2.5 -1.9 70% Imaginative narrative; 31% General narrative exposition Difference 6.05 0.38 2.99 -0.58 1.45 0.51 Romantic fiction - MAT 0.83 6.02 0.41 -0.08 -1.15 -1.08 59% Imaginative narrative; 31% General narrative exposition; 10% Involved persuasion 15 Romantic fiction - Biber (1988) 4.3 7.2 -4.1 1.8 -3.1 -1.2 92% Imaginative narrative; 8% General narrative exposition Difference 3.47 1.18 4.51 1.88 1.95 0.12 Humour - MAT -6.76 2.96 2.56 -1.16 0.42 -0.46 67% General narrative exposition; 22% Imaginative narrative; 11% Learned exposition Humour - Biber (1988) -7.8 0.9 -0.8 -0.3 -0.4 -1.5 89% General narrative exposition; 11% Involved persuasion Difference 1.04 2.06 3.36 0.86 0.82 1.04 16 Greater differences can be observed between MAT scores and Biber’s (1988) scores. However, given that the Brown corpus contains identical genres but different texts from the LOB corpus, the results obtained from the analysis of the Brown corpus suggest that the Dimensions found by Biber (1988) are still valid for those genres even when considering different texts. The results obtained with the latter experiment are encouraging and suggest that MAT can be used to assign Biber’s (1988) Dimension scores to texts. Furthermore, MAT can be used to categorise a text for its text type, as proposed by Biber (1989). List of the variables Each variable is described in a short paragraph. Next to the name of the variable is the tag used by the present tagger to identify it. An asterisk appears next to the name of the variables for which Biber (1988) manually checked the results. The present version of the tagger does not allow any manual intervention in the tagging process. However, the texts can be manually checked before the analysis takes place. AMP: Amplifiers This tag finds any of the items in this list: absolutely, altogether, completely, enormously, entirely, extremely, fully, greatly, highly, intensely, perfectly, strongly, thoroughly, totally, utterly, very. ANDC: Independent clause coordination This tag is assigned to the word and when it is found in one of the following patterns: (1) preceded by a comma and followed by it, so, then, you, there + BE, or a demonstrative pronoun (DEMP) or the subject forms of a personal pronouns; (2) preceded by any punctuation; (3) followed by a WH pronoun or any WH word, an adverbial subordinator (CAUS, CONC, COND, OSUB) or a discourse particle (DPAR) or a conjunct (CONJ). AWL: Average word length Mean length of the words in the text in orthographic letters. A word is any string separated by space in the text tokenised by the Stanford Tagger. 17 BEMA: Be as main verb BE is tagged as being a main verb in the following pattern: BE followed by a determiner (DT), or a possessive pronoun (PRP$) or a preposition (PIN) or an adjective (JJ). This algorithm was improved in the present tagger by taking into account that adverbs or negations can appear between the verb BE and the rest of the pattern. Furthermore, the algorithm was slightly modified and improved: (a) the problem of a double-coding of any Existential there followed by a form of BE as a BEMA was solved by imposing the condition that there should not appear before the pattern; (b) the cardinal numbers (CD) tag and the personal pronoun (PRP) tag were added to the list of items that can follow the form of BE. BYPA: By-passives The tagger assigns this tag every time the patterns for PASS are found and the preposition by follows it. CAUS: Causative adverbial subordinators This tag identifies any occurrence of the word because. CONC: Concessive adverbial subordinators This tag identifies any occurrence of the words although and though. Biber’s algorithm was improved by including the abbreviation tho. COND: Conditional adverbial subordinators This tag identifies any occurrence of the words if and unless. CONJ: Conjuncts This tag finds any of the items in this list: punctuation+else, punctuation+altogether, punctuation+rather, alternatively, consequently, conversely, e.g., furthermore, hence, however, i.e., instead, likewise, moreover, namely, nevertheless, nonetheless, notwithstanding, otherwise, similarly, therefore, thus, viz., in comparison, in contrast, in particular, in addition, in conclusion, in consequence, in sum, in summary, for example, for instance, instead of, by contrast, by comparison, in any event, in any case, in other words, as a result, as a consequence, on the contrary, on the other hand. 18 Some minor inconsistencies in the said list were fixed. For example, Biber lists the word rather two times in this list, making the second mentions redundant. Rather was counted only when it appeared after a punctuation mark. The same applies for altogether. In cases of multi- word units such as on the other hand, only the first word is tagged as OSUB and the other words are tagged with the tag NULL. CONT: Contractions The contractions were tagged by identifying any instance of apostrophe followed by a tagged word OR any instance of the item n’t. DEMO: Demonstratives A demonstrative is found when the words that, this, these, those have not been tagged as either DEMP, TOBJ, TSUB, THAC, or THVC. DEMP: Demonstrative pronouns* The program tags as demonstrative pronouns the words those, this, these when they are followed by a verb (any tag starting with V) or auxiliary verb (modal verbs in the form of MD tags or forms of DO or forms of HAVE or forms of BE) or a punctuation mark or a WH pronoun or the word and. The word that is tagged as a demonstrative pronoun when it follows the said pattern or when it is followed by ‘s or is and, at the same time, it has not been already tagged as a TOBJ, TSUB, THAC or THVC. DPAR : Discourse particles The program tags as discourse particles the words well, now, anyhow, anyways preceded by a punctuation mark. DWNT: Downtoners This tag finds any of the items in this list: almost, barely, hardly, merely, mildly, nearly, only, partially, partly, practically, scarcely, slightly, somewhat. The word almost was classified by Biber as being both a hedge and a downtoner. In the present tagger almost is considered a downtoner only. 19 EMPH: Emphatics This tag finds any of the items in this list: just, really, most, more, real+adjective, so+adjective, any form of DO followed by a verb, for sure, a lot, such a. In cases of multi- word units such as a lot, only the first word is tagged as OSUB and the other words are tagged with the tag NULL. EX: Existential there Existential there is tagged by the Stanford Tagger as EX (for further reference: http://catalog.ldc.upenn.edu/docs/LDC99T42/tagguid1.pdf). FPP1: First person pronouns Any item of this list: I, me, us, my, we, our, myself, ourselves. GER: Gerunds* The program tags as gerunds any nominal form (N) that ends in –ing or –ings. To improve the accuracy, only words longer than 10 characters are considered as gerunds. HDG: Hedges This tag finds any of the items in this list: maybe, at about, something like, more or less, sort of, kind of (these two items must be preceded by a determiner (DT), a quantifier (QUAN), a cardinal number (CD), an adjective (JJ or PRED), a possessive pronouns (PRP$) or WH word (see entry on WH-questions)). In cases of multi-word units such as more or less, only the first word is tagged as HDG and the other words are tagged with the tag NULL. INPR: Indefinite pronouns Any item of this list: anybody, anyone, anything, everybody, everyone, everything, nobody, none, nothing, nowhere, somebody, someone, something. JJ: Attributive adjectives (e.g. the big horse) Biber (1988) specifies that attributive adjectives were counted when an adjective was followed by another adjective or a noun. However, Biber states that also all the adjectives that were not identified as predicative were counted as attributive adjectives. Therefore, the 20 present tagger does not have an algorithm to identify attributive adjectives. All the adjectives that the Stanford Tagger has already tagged as JJ, JJS, or JJR are considered attributive adjectives and are all re-assigned to the tag JJ. The predicative adjectives are tagged by another algorithm and therefore distinguished from the rest. NEMD: Necessity modals The necessity modals listed by Biber (1988): ought, should, must. NN: Total other nouns Any noun that has been tagged by the Stanford Tagger as NN and that has not been identified a nominalisation or a gerund is left as such. Plural nouns (NNS) and proper nouns (NNP and NNPS) tags are changed to NN and included in this count. NOMZ: Nominalizations Any noun ending in -tion, -ment, -ness, or -ity, plus the plural forms. Although Biber (1988) does not mention that this variables was checked manually, it is likely that a stop list was used to avoid obviously erroneous tagging (e.g. city). However, this was not indicated in the appendix of Biber (1988). OSUB: Other adverbial subordinators This tag identifies any occurrence of the words: since, while, whilst, whereupon, whereas, whereby, such that, so that (followed by a word that is neither a noun nor an adjective), such that (followed by a word that is neither a noun nor an adjective), inasmuch as, forasmuch as, insofar as, insomuch as, as long as, as soon as. In cases of multi-word units such as as long as, only the first word is tagged as OSUB and the other words are tagged with the tag NULL. Other Stanford Tagger tags If the user selects “all tags” from the main window then all the tags assigned by the Stanford Tagger are counted as well. A list of the Stanford Tagger tags and the description of how they are identified can be found here: http://catalog.ldc.upenn.edu/docs/LDC99T42/tagguid1.pdf 21 PASS: Agentless passives This tag is assigned when one of the two following patterns is found: (a) any form of BE followed by a participle (VBN or VBD) plus one or two optional intervening adverbs (RB) or negations; (b) any form of BE followed by a nominal form (a noun, NN, NNP or personal pronoun, PRP) and a participle (VBN or VBD). This algorithm was slightly changed from Biber’s version in the present tagger. It was felt necessary to implement the possibility of an intervening negation in the pattern (b). This tag is therefore assigned also in the cases in which a negation precedes the nominal form of pattern (b). PASTP: Past participial clauses* (e.g. Built in a single week, the house would stand for fifty years) This tag is assigned when the following pattern is found: a punctuation mark followed by a past participial form of a verb (VBN) followed by a preposition (PIN) or an adverb (RB). PEAS: Perfect aspect This is calculated by counting how many times a form of HAVE is followed by: a VBD or VBN tag (a past or participle form of any verb). These are also counted when an adverb (RB) or negation (XX0) occurs between the two. The interrogative version is counted too. This is achieved by counting how many times a form of HAVE is followed by a nominal form (noun, NN, proper noun, NP or personal pronoun, PRP) and then followed by a VBD or VBN tag. As for the affirmative version, the latter algorithm also accounts for intervening adverbs or negations. PHC: Phrasal coordination This tag was assigned for any and that is preceded and followed by the same tag and when this tag is either an adverb tag, or an adjective tag, or a verb tag or a noun tag. PIN: Total prepositional phrases This tag identifies any occurrence of the prepositions listed by Biber (1988) under this category. As described in the section on infinitives, the preposition to is disambiguated by the infinitive marker to. Biber (1988) does not specifies whether he included any instance of the word to or he distinguished the two grammatical functions of this word. However, it was felt the distinction needed to be applied to the present tagger for improved accuracy. 22 PIRE: Pied-piping relative clauses (e.g. the manner in which he was told) This tag is assigned when the following pattern is found: any preposition (PIN) followed by who, who, whose or which. PIT: Pronoun it Any pronoun it. Although not specified in Biber (1988), the present program also tags its and itself as “Pronoun it”. PLACE: Place adverbials Any item in this list: aboard, above, abroad, across, ahead, alongside, around, ashore, astern, away, behind, below, beneath, beside, downhill, downstairs, downstream, east, far, hereabouts, indoors, inland, inshore, inside, locally, near, nearby, north, nowhere, outdoors, outside, overboard, overland, overseas, south, underfoot, underground, underneath, uphill, upstairs, upstream, west. If an item is tagged by the Stanford Tagger as a proper noun (NNP), this is not tagged as place adverbial. POMD: Possibility modals The possibility modals listed by Biber (1988): can, may, might, could. PRED: Predicative adjectives (e.g. the horse is big) The tagger tags as PRED the adjectives that are found in the following pattern: any form of BE followed by an adjective (JJ) followed by a word that is NOT another adjective, an adverb (RB) or a noun (N). If any adverb or negation is intervening between the adjective and the word after it, the tag is still assigned. A modification to Biber’s algorithm was implemented in the present tagger to improve its accuracy. An adjective is tagged as predicative if it is preceded by another predicative adjective followed by a phrasal coordinator (see below). This pattern accounts for cases such as: the horse is big and fast. PRESP: Present participial clauses* (e.g. Stuffing his mouth with cookies, Joe ran out the door) This tag is assigned when the following pattern is found: a punctuation mark is followed by a present participial form of a verb (VBG) followed by a preposition (PIN), a determiner (DT, 23 QUAN, CD), a WH pronoun, a WH possessive pronoun (WP$), any WH word, any pronoun (PRP) or any adverb (RB). PRIV: Private verbs This tag finds any of the items listed by Quirk et al. (1985: 1181–2): accept, accepts, accepting, accepted, anticipate, anticipates, anticipating, anticipated, ascertain, ascertains, ascertaining, ascertained, assume, assumes, assuming, assumed, believe, believes, believing, believed, calculate, calculates, calculating, calculated, check, checks, checking, checked, conclude, concludes, concluding, concluded, conjecture, conjectures, conjecturing, conjectured, consider, considers, considering, considered, decide, decides, deciding, decided, deduce, deduces, deducing, deduced, deem, deems, deeming, deemed, demonstrate, demonstrates, demonstrating, demonstrated, determine, determines, determining, determined, discern, discerns, discerning, discerned, discover, discovers, discovering, discovered, doubt, doubts, doubting, doubted, dream, dreams, dreaming, dreamt, dreamed, ensure, ensures, ensuring, ensured, establish, establishes, establishing, established, estimate, estimates, estimating, estimated, expect, expects, expecting, expected, fancy, fancies, fancying, fancied, fear, fears, fearing, feared, feel, feels, feeling, felt, find, finds, finding, found, foresee, foresees, foreseeing, foresaw, forget, forgets, forgetting, forgot, forgotten, gather, gathers, gathering, gathered, guess, guesses, guessing, guessed, hear, hears, hearing, heard, hold, holds, holding, held, hope, hopes, hoping, hoped, imagine, imagines, imagining, imagined, imply, implies, implying, implied, indicate, indicates, indicating, indicated, infer, infers, inferring, inferred, insure, insures, insuring, insured, judge, judges, judging, judged, know, knows, knowing, knew, known, learn, learns, learning, learnt, learned, mean, means, meaning, meant, note, notes, noting, noted, notice, notices, noticing, noticed, observe, observes, observing, observed, perceive, perceives, perceiving, perceived, presume, presumes, presuming, presumed, presuppose, presupposes, presupposing, presupposed, pretend, pretend, pretending, pretended, prove, proves, proving, proved, realize, realise, realising, realizing, realises, realizes, realised, realized, reason, reasons, reasoning, reasoned, recall, recalls, recalling, recalled, reckon, reckons, reckoning, reckoned, recognize, recognise, recognizes, recognises, recognizing, recognising, recognized, recognised, reflect, reflects, reflecting, reflected, remember, remembers, remembering, remembered, reveal, reveals, revealing, revealed, see, sees, seeing, saw, seen, sense, senses, sensing, sensed, show, shows, showing, showed, shown, signify, signifies, signifying, 24 signified, suppose, supposes, supposing, supposed, suspect, suspects, suspecting, suspected, think, thinks, thinking, thought, understand, understands, understanding, understood. PRMD: Predictive modals The predictive modals listed by Biber (1988): will, would, shall and their contractions: ‘d_MD, ll_MD, wo_MD, sha_MD. PROD: Pro-verb do Any form of DO that is used as main verb and, therefore, excluding DO when used as auxiliary verb. The tagger tags as PROD any DO that is NOT in neither of the following patterns: (a) DO followed by a verb (any tag starting with V) or followed by adverbs (RB), negations and then a verb (V); (b) DO preceded by a punctuation mark or a WH pronoun (the list of WH pronouns is in Biber (1988)). PUBV: Public verbs This tag finds any of the items listed by Quirk et al. (1985: 1180–1): acknowledge, acknowledged, acknowledges, acknowledging, add, adds, adding, added, admit, admits, admitting, admitted, affirm, affirms, affirming, affirmed, agree, agrees, agreeing, agreed, allege, alleges, alleging, alleged, announce, announces, announcing, announced, argue, argues, arguing, argued, assert, asserts, asserting, asserted, bet, bets, betting, boast, boasts, boasting, boasted, certify, certifies, certifying, certified, claim, claims, claiming, claimed, comment, comments, commenting, commented, complain, complains, complaining, complained, concede, concedes, conceding, conceded, confess, confesses, confessing, confessed, confide, confides, confiding, confided, confirm, confirms, confirming, confirmed, contend, contends, contending, contended, convey, conveys, conveying, conveyed, declare, declares, declaring, declared, deny, denies, denying, denied, disclose, discloses, disclosing, disclosed, exclaim, exclaims, exclaiming, exclaimed, explain, explains, explaining, explained, forecast, forecasts, forecasting, forecasted, foretell, foretells, foretelling, foretold, guarantee, guarantees, guaranteeing, guaranteed, hint, hints, hinting, hinted, insist, insists, insisting, insisted, maintain, maintains, maintaining, maintained, mention, mentions, mentioning, mentioned, object, objects, objecting, objected, predict, predicts, predicting, predicted, proclaim, proclaims, proclaiming, proclaimed, promise, promises, promising, promised, pronounce, pronounces, pronouncing, pronounced, prophesy, prophesies, prophesying, prophesied, protest, protests, protesting, protested, remark, remarks, remarking, remarked, 25 repeat, repeats, repeating, repeated, reply, replies, replying, replied, report, reports, reporting, reported, say, says, saying, said, state, states, stating, stated, submit, submits, submitting, submitted, suggest, suggests, suggesting, suggested, swear, swears, swearing, swore, sworn, testify, testifies, testifying, testified, vow, vows, vowing, vowed, warn, warns, warning, warned, write, writes, writing, wrote, written. RB: Total adverbs All the adverbs that the Stanford Tagger has already tagged as RB, RBS, RBR or WRB are all re-assigned to the tag RB in order to have a final count of total adverbs (for further reference: http://catalog.ldc.upenn.edu/docs/LDC99T42/tagguid1.pdf). SERE: Sentence relatives* (e.g. Bob likes fried mangoes, which is disgusting) A sentence relative is counted and tagged every time a punctuation mark is followed by the word which. SMP: Seem\|appear Any occurrence of any of the forms of the two verbs seem and appear. SPAU: Split auxiliaries (e.g. they are objectively shown that...) Split auxiliaries are identified every time an auxiliary (any modal verb MD, or any form of DO, or any form of BE, or any form of HAVE) is followed by one or two adverbs and a verb base form. SPIN: Split infinitives (e.g. he wants to convincingly prove that...) Split infinitives are identified every time an infinitive marker to is followed by one or two adverbs and a verb base form. SPP2: Second person pronouns Any item of this list: you, your, yourself, yourselves, thy, thee, thyself, thou. 26 STPR: Stranded preposition (e.g. the candidate that I was thinking of) A stranded preposition is identified every time a preposition is followed by a punctuation mark. However, this algorithm was improved by adding that the preposition cannot be besides, since this word can also be a conjunct and, therefore, usually followed by a punctuation mark. SUAV: Suasive verbs This tag finds any of the items listed by Quirk et al. (1985: 1182–3): agree, agrees, agreeing, agreed, allow, allows, allowing, allowed, arrange, arranges, arranging, arranged, ask, asks, asking, asked, beg, begs, begging, begged, command, commands, commanding, commanded, concede, concedes, conceding, conceded, decide, decides, deciding, decided, decree, decrees, decreeing, decreed, demand, demands, demanding, demanded, desire, desires, desiring, desired, determine, determines, determining, determined, enjoin, enjoins, enjoining, enjoined, ensure, ensures, ensuring, ensured, entreat, entreats, entreating, entreated, grant, grants, granting, granted, insist, insists, insisting, insisted, instruct, instructs, instructing, instructed, intend, intends, intending, intended, move, moves, moving, moved, ordain, ordains, ordaining, ordained, order, orders, ordering, ordered, pledge, pledges, pledging, pledged, pray, prays, praying, prayed, prefer, prefers, preferring, preferred, pronounce, pronounces, pronouncing, pronounced, propose, proposes, proposing, proposed, recommend, recommends, recommending, recommended, request, requests, requesting, requested, require, requires, requiring, required, resolve, resolves, resolving, resolved, rule, rules, ruling, ruled, stipulate, stipulates, stipulating, stipulated, suggest, suggests, suggesting, suggested, urge, urges, urging, urged, vote, votes, voting, voted, SYNE: Synthetic negation The following pattern was identified as synthetic negation: no followed by any adjective (both JJ and PRED) and any noun or proper noun. The words neither and nor were also tagged as instances of synthetic negation. THAC: That adjective complements* The program tags as THAC any word that preceded by an adjective (JJ or a predicative adjective, PRED). 27 THATD: Subordinator that deletion The tag THATD is added when one of the following patterns is found: (1) a public, private or suasive verb followed by a demonstrative pronoun (DEMP) or a subject form of a personal pronoun; (2) a public, private or suasive verb is followed by a pronoun (PRP) or a noun (N) and then by a verb (V) or auxiliary verb; (3) a public, private or suasive verb is followed by an adjective (JJ or PRED), an adverb (RB), a determiner (DT, QUAN, CD) or a possessive pronoun (PRP$) and then a noun (N) and then a verb or auxiliary verb, with the possibility of an intervening adjective (JJ or PRED) between the noun and its preceding word. THVC: That verb complements* This tag is assigned when the word that is: (1) preceded by and, nor, but, or, also or any punctuation mark and followed by a determiner (DT, QUAN, CD), a pronoun (PRP), there, a plural noun (NNS) or a proper noun (NNP); (2) preceded by a public, private or suasive verb or a form of seem or appear and followed by any word that is NOT a verb (V), auxiliary verb (MD, form of DO, form of HAVE, form of BE), a punctuation or the word and; (3) preceded by a public, private or suasive verb or a form of seem or appear and a preposition and up to four words that are not nouns (N). TIME: Time adverbials Any item in this list: afterwards, again, earlier, early, eventually, formerly, immediately, initially, instantly, late, lately, later, momentarily, now, nowadays, once, originally, presently, previously, recently, shortly, simultaneously, subsequently, today, to-day, tomorrow, to-morrow, tonight, to-night, yesterday. The list used in Biber (1988) was improved by adding that the word soon is not a time adverbial if it is followed by the word as. Furthermore, old spellings of the time adverbials starting with to- were added (e.g. to- morrow). TO: Infinitives The tag for infinitives is the Stanford Tagger Treebank tag TO. The Stanford Tagger does not distinguish when the word to is used as an infinitive marker or a preposition. Therefore, an algorithm was implemented to identify instances of to as preposition. This algorithm finds any occurrence of to followed by a subordinator (IN), a cardinal number (CD), a determiner (DT), an adjective (JJ), a possessive pronoun (PRP$), WH words (WP$, WDT, WP, WRB), a 28 pre-determiner (PDT), a noun (N, NNS, NP, NPs), or a pronoun (PRP) and tags it as a preposition. The remaining instances of to are considered as being infinitive markers and are therefore identifying occurrences of infinitive clauses. TOBJ: That relative clauses on object position* (e.g. the dog that I saw) These are occurrences of that preceded by a noun and followed by a determiner (DT, QUAN, CD), a subject form of a personal pronoun, a possessive pronoun (PRP$), the pronoun it, an adjective (JJ), a plural noun (NNS), a proper noun (NNP) or a possessive noun (a noun (N) followed by a genitive marker (POS)). As Biber specifies, however, this algorithm does not distinguish between simple complements to nouns and true relative clauses. TPP3: Third person pronouns Any item of this list: she, he, they, her, him, them, his, their, himself, herself, themselves. TSUB: That relative clauses on subject position* (e.g. the dog that bit me) These are occurrences of that preceded by a noun (N) and followed by an auxiliary verb or a verb (V), with the possibility of an intervening adverb (RB) or negation (XX0). TTR: Type-token ratio In Biber (1988), the tagger considered only the first 400 tokens of the text and counted how many types were present in these 400 tokens. The resulting number was therefore the number of types in the first 400 words of the text. If a text was shorter than 400 tokens, it was excluded from this analysis. The number 400 was chosen by Biber supposedly as it provided a compromise between accuracy and number of texts that could be measured. Since the present tagger can be applied to corpora of different sizes, it was felt that this number should be left to the user to decide. The tagger will therefore ask to input the number before the tagging starts. It will then count how many types there are in the first X number of tokens given by the user. For texts shorter than X, the program will count the types for the whole text. The user can decide which number to use based on either the shortest text in the corpus or perhaps on the statistical mode of the population of the number of tokens for the whole corpus. 29 By default, this number is 400. The variable type-token ratio will be included in the calculation of Dimension 1 only if the user has not changed the default number. This is done in order to maintain compatibility with Biber’s (1988) calculations. VBD: Past tense The Stanford Tagger tag VBD is used for this variable (for further reference: http://catalog.ldc.upenn.edu/docs/LDC99T42/tagguid1.pdf). VPRT: Present tense Any verb that received by the Stanford Tagger a VBP or VBZ tag (present tense or third person present verb) is tagged as VPRT (for further reference: http://catalog.ldc.upenn.edu/docs/LDC99T42/tagguid1.pdf). WHCL: WH-clauses (e.g. I believed what he told me) This tag is assigned when the following pattern is found: any public, private or suasive verb followed by any WH word, followed by a word that is NOT an auxiliary (tag MD for modal verbs, or a form of DO, or a form of HAVE, or a form of BE). WHOBJ: WH relative clauses on object position (e.g. the man who Sally likes) This tag is assigned when the following pattern is found: any word that is NOT a form of the words ASK or TELL followed by any word, followed by a noun (N), followed by any word that is NOT an adverb (RB), a negation (XX0) , a verb or an auxiliary verb (MD, forms of HAVE, BE or DO). WHQU: Direct WH-questions Any punctuation followed by a WH word (what, where, when, how, whether, why, whoever, whomever, whichever, wherever, whenever, whatever, however) and followed by any auxiliary verb (modal verbs in the form of MD tags or forms of DO or forms of HAVE or forms of BE). This algorithm was slightly changed by allowing an intervening word between the punctuation mark and the WH word. This allows WH-questions containing discourse markers such as ‘so’ or ‘anyways’ to be recognised. Furthermore, Biber’s algorithm was 30 improved by excluding WH words such as however or whatever that do not introduce WH- questions. WHSUB: WH relative clauses on subject position (e.g. the man who likes popcorn) This tag is assigned when the following pattern is found: any word that is NOT a form of the words ASK or TELL followed by a noun (N), then a WH pronoun, then by any verb or auxiliary verb (V), with the possibility of an intervening adverb (RB) or negation (XX0) between the WH pronoun and the verb. WZPAST: Past participial WHIZ deletion relatives* (e.g. The solution produced by this process) This tag is assigned when the following pattern is found: a noun (N) or quantifier pronoun (QUPR) followed by a past participial form of a verb (VBN) followed by a preposition (PIN) or an adverb (RB) or a form of BE. WZPRES: Present participial WHIZ deletion relatives* (e.g. the event causing this decline is....) This tag is assigned a present participial form of a verb (VBG) is preceded by a noun (NN). XX0: Analytic negation This tag was assigned to the word not and to the item n’t_RB. References Biber, D. (1988). Variation across Speech and Writing. Cambridge: Cambridge University Press. Biber, D. (1989). A typology of English texts. Linguistics, 27(1), 3–43. Stanford Tagger v. 3.1.5. Retrieved from: http://nlp.stanford.edu/software/tagger.shtml. Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A Comprehensive Grammar of the English Language. London: Longman. 31	Translation - Portuguese Manual: Etiquetador de Análise Multidimensional (v. 1.3) O Etiquetador de Análise Multidimensional (MAT) é um programa para Windows que reproduz o etiquetador de Biber (1988) para a análise funcional multidimensional de textos em inglês, geralmente aplicado para estudos sobre variações de tipo ou gênero de texto. O programa gera uma versão do texto escolhido com categorias gramaticais, bem como as estatísticas necessárias para realizar uma análise de tipologia ou gênero textual. O programa classifica os textos em categorias de acordo com as dimensões propostas por Biber (1988). Adicionalmente, o programa oferece uma ferramenta para visualizar as características de dimensões dos textos. A seguir são apresentadas as dimensões e tipologia textual criados por Biber. Esta é uma implementação do etiquetador usado em Biber (1988) e em muitas outras obras. Este etiquetador pretende fazer uma reprodução o mais fiel possível da análise de Biber (1988), considerando os algoritmos utilizados por ele no apêndice de seu livro. A análise básica do texto é feito através o etiquetador de Stanford. O presente etiquetador inclui uma cópia daquele usado por de Stanford (2013), que é executado automaticamente para criar uma análise gramatical inicial. O MAT vai além desta análise, e melhora o etiquetador etiqueta set de Standford quando identifica os recursos linguísticos usados em Biber (1988). Este manual traz uma descrição minuciosa do etiquetador, bem como instruções para o usuário. O Etiquetador Para ler mais: Nini, 2015 a.. Manual do Etiquetador de Análise Multidimensional (v. 1.3) Fonte: http://sites.google.com/site/multidimensionaletiquetador Como este programa é baseado no manual do Etiquetador Stanford , a leitura deste é necessária quando o programa for utilizado. Para a leitura do etiquetador consulte o website de Stanford Etiquetador: http://nlp.stanford.edu/software/etiquetador.shtml. Funcionamento do programa O Java é necessário para o programa.O download é feito no http://Java.com/en/download/index.jsp Etiquetador Este módulo do programa aceita apenas arquivos de texto no formato '. txt'. O usuário pode selecionar uma pasta de arquivos. txt ou um arquivo único. txt. Também é possível simplesmente arrastar e soltar um arquivo ou pasta no programa. O etiquetador MAT usa o etiquetador de Stanford para uma separação inicial em partes do discurso e então utiliza a classificação de Biber (1988). Algumas etiquetas de Stanford Etiquetador básicos são substituídos por novas marcas que são mais específicas. Por exemplo, negações e preposições são colocadas em categorias, e respectivamente, os advérbios gerais e subordinados gerais em outras. É demonstrada a diferença entre o' to' marca do infinitivo, e o 'to' preposição Três etiquetas são adicionadas a fim de facilitar a identificação dos recursos de linguística do Biber (1988), estas são: (1) pronomes indefinidos (INPR): anybody, anyone, anything, everybody, everyone, everything, nobody, none, nothing, nowhere, somebody, someone, something; (2)quantifiers (QUAN)each, all, every, many, much, few, several, some, any; (3) pronomes quantifiers (QUPR)everybody, somebody, anybody, everyone, someone,anyone, everything, something, anything. Segue a lista completa de etiquetas e uma descrição dos algoritmos usados para encontrá-los: Os textos de Stanford marcados estarão na pasta 'ST_name_of_folder' ou 'ST_name_of_file'. Os textos marcados pelo MAT aparecerão em uma pasta chamada 'MAT_name_of_folder' ou 'MAT_name_of_text'. Ambas as pastas serão criadas na pasta selecionada para a análise. Quando o etiquetador é iniciado, um módulo do etiquetador irá verificar a codificação dos arquivos.txt selecionados. Então, o etiquetador localiza qualquer texto em Unicode e cabe ao usuário para alterar isso para um formato compatível, como ANSI ou UTF-8. Na seguência o etiquetador irá analisar cada um dos arquivos .txt a fim de encontrar ocorrências de aspas curvas. Esta etapa é necessária, pois algumas abreviaturas não são marcados corretamente. Ao encontrar aspas o etiquetador as substituirá por vírgulas padronizadas. Isto fará com que o arquivo.txt seja apagado e substituído. Se for necessário manter o original com aspas, recomendável criar uma cópia de backup antes de executar MAT. Analisador Este módulo do programa pode ser acionado pelsas tecla "Analyse" ou 'Analyse Etiqueta'. É possível, também, simplesmente arrastar e soltar um arquivo ou pasta no programa. Quando este módulo é iniciado, o usuário será solicitado a inserir o número de tokens para os quais a proporção do tipo de token devem ser calculadas ( vide o tópico type-token ratio na lista de variáveis para mais detalhes) Por convenção, este número é de 400 conforme Biber (1988). A partir disso, o usuário define as dimensões a serem mostradas nos gráficos. A análise resultará em novos arquivos salvos na pasta "Estatística". Estes arquivos ficarão junto aos textos classificados pelo MAT na mesma pasta. Estes arquivos são: 1) 'Corpus_Statistics.txt': um arquivo delimitado guia que mostra a frequência por 100 fichas para todas as variáveis lingüísticas (veja abaixo) encontradas no texto de entrada ou corpus. Se o usuário selecionar a opção "all etiquetas ', então este arquivo mostrará uma lista para todas as etiquetas no texto, incluindo os itens de pontuação. Por outro lado, se o usuário selecionar a opção 'only VASW etiquetas', apenas as etiquetas usadas em Biber (1988) serão mostradas. 2) 'Zscores.txt': uma tab que inclui os z-scores das variáveis linguísticas para o arquivo de entrada ou corpus. Se o usuário selecionou um arquivo para a análise, a média para o corpus será mostrada. t Os z-scores são calculados de acordos com médias e desvios padrão apresentados no Biber (1988: 77. Para cada texto e o corpus como um todo, o programa irá marcar todos os z-scores com uma magnitude superior a 2 como "Interesting variables". Os z-scores exibidos neste arquivo não são afetados pela escolha do usuário da correção z-score. A opção "correção z-score 'afeta apenas a pontuação de Dimensão. 3) `Dimensions.txt': uma tab contendo a pontuação para a Dimensão assim como as médias para o corpus, se o usuário selecionou uma pasta de arquivos de textos. A pontuação de Dimensão de acordo com as z-scores das variáveis com 1 ponto acima da média no quadro de Biber (1988: 77). A confiabilidade da pontuação de Dimensão foi testada de acordo com LOB e o corpus de Brown. Os resultados dos testes são apresentados abaixo. O programa classifica cada texto de acordo com gêneros textuais aproximados como proposto por Biber (1989). Para tanto a distância euclidiana foi utilizada. Se o usuário selecionou um arquivo para a análise, a média para o corpus será mostrada. Se o usuário escolher usar a correção da z-score, a escolha será mostrada pela pontuação. Quando o usuário seleciona a correção da z-score, todas as z-conetiquetaens usadas para obter a pontuação de Dimensão têm seus valores testados. Se o valor for maior que 5, o número será arrendondado para 5. Esta correção evita o problema de poucas variáveis infrequëntes que afetam a pontuação total de Dimensão. Esta opção deve ser usada com cuidado e recomenda-se particularmente somente para textos muito curtos. 4) `Dimension#.png': um gráfico que indica o ponto da pontuação da dimensão dos textos comparados com os gêneros de Biber (1988: 172). O gráfico indica a média de cada gênero, assim com a extensão de cada um deles. Se o usuário escolher somente um texto, a pontuação do texto será mostrada. Por outro lado, se um corpus foi escolhido, será mostrado a média e a extensão para o corpus. Junto ao título do gráfico o programa imprimirá o gênero mais aproximado do texto ou o corpus. O MAT cria o número de gráficos conforme as escolhas do usuário. 5) `Text_types.png': um gráfico que indica o local do texto ou do corpus analisado com relação aos oito gêneros textuais de Biber (1989). Junto ao título do gráfico o programa imprimirá o gênero mais aproximado do texto ou o corpus. Os gêneros textuais são classisficados de acordo com a distância eclidiana. Ferramenta de inspeção Esta ferramenta permite que o usuário indique as características da Dimensão de um único texto. Também é possível arrastar um arquivo do MAT para análise. O usuário pode escolher as Dimensões para visualizar. Com a ferramenta acionada o programa cria um arquivo com o nome de "FILENAME_features.html" que constará da pasta onde o texto escolhido está. Esta ferramenta pode ser usada somente com textos classificados pelo MAT. Quadro das Dimensões de Biber (1988) Dimensão Descrição 1 A Dimensão 1 mostra a distinção entre a linguagem informal e a científica . Uma pontuação baixa nesta variável indica que o texto tem mais características informativas, por exemplo linguagem acadêmica, já a pontuação alta indica se tratar de um texto com linguagem afetiva e interactional, como por exemplo numa conversa informal. Uma pontuação elevada nesta Dimensão significa que há muitos verbos e pronomes no texto (entre outras características) uma pontuação baixa nesta Dimensão significa que há muitos substantivos, palavras e adjetivos longos (dentre outras características). 2 A Dimensão 2 demonstra a distinção entre o texto com narração e o sem narração. Uma pontuação aqui uma pontuação baixa aponta que o texto não é narrativo, enquanto a pontuação alta indica se tratar de uma narrativa, como por exemplo uma novela. Uma pontuaçao elevada nesta Dimensão significa que há muitos verbos no passado, bem como pronomes na 3ª pessoa (entre outras características). 3 A Dimensão 3 é a oposição entre o context-independent discourse e o context-dependent discourse A pontuação baixa nesta variável indica que o texto depende do contexto, como no exemplo uma transmissão de competição esportiva, mas a pontuação elevada indica que o texto independe do contexto, como no discurso acadêmico. A pontuação alta nesta Dimensão significa que o texto apresenta muitas nominalizations (dentre outras características) uma pontuação baixa nesta Dimensão significa que o texto apresenta muitos advérbios (dentre outras características). 4 A Dimensão 4 mensura o aspecto persuasivo. A pontuação elevada nesta variável indica que o ponto de vista do autor é enfatizado assim como avaliação da probabilidade e/ou da certeza, como por exemplo em escrita formal (cartas formais, comunicados, memorandos, etc). Uma pontuação elevada nesta Dimensão significa que o texto apresenta muitos verbos modais (dentre outras características). 5 A dimensão 5 é a oposição entre a informação astrata e não abstrata. pontuação elevada demonstra que o texto lida com informações formais, abstratas e técnicas, geralmente presentes na linguagem científico. Uma pontuaçao elevada nesta dimensão significa que o texto apresenta muitas conjunções e orações na voz passiva (dentre outras características). Dimensão 6 demonstra elaboração informacional via internet. Uma pontuação elevada nesta 6 a variável indica que o texto é informativo mas é produzido sob certas restrições de tempo, como é o caso de discursos. Uma pontuação elevada nesta dimensão significa que o texto apresenta locuções adverbiais substantivas adaptadas (dentre outras características). Quadro dos Gêneros Textuais de Biber (de 1989) Tipologia Textual Características dos Gêneros Características das Dimensões Descrição Diálogos Informais Textos Técnicos diálogos telefônicos entre amigos Prosa acadêmica, documentos oficiais pontuação alta na D1, pontuação baixa na D3, pontuação baixa na D5, pontuação inexpressiva nas demais Dimensões Pontuação baixa na D1, pontuação alta na D3, pontuação alta na D5, pontuações inexpressivas nas demais Dimensões Os textos caracterizam-se por interações com conteúdo interpessonal, que ocorrem entre conhecidos íntimos. Os textos têm como característica exposições informativas formais e focadas na transmissão das informações e bem técnicos Diálogos Informativos interações orais ao vivo, conversas telefônicas, discursos espontâneos pontuação alta na D1, pontuação baixa na D3, pontuação baixa na D5, pontuação inexpressiva nas outras Dimensões Os textos têm como caracterísca diálogos orais, com o foco na informação. Texto Dissertativo documentos oficiais, textos jornalísticos, prosa acadêmica pontuação baixa na D1, pontuação alta na D3, pontuação alta na D5, pontuação inexpressiva nas demais Dimensões Os textos são exposições formais, com foco na transmissão de informações. Texto Narrativo ficção romântica, ficção em geral, discursos escritos pontuação alta na D2, pontuação baixa na D3, pontuação inexpressiva nas demais Dimensões Os textos caracterizam-se pela narração de sequência de acontecimentos. Texto Descritivo editoriais, biografias, transmissões (exceto esportivas), ficção científica pontuação alta na D2, pontuação inexpressiva nas demais Dimensões Os textos utilizam o elemento narrativo como o objetivo de informar. Reporem Situacional transmissões esportivas pontuação baixa na D3, pontuação baixa na D4, pontuação inexpressiva nas demais Dimensões Os textos são transmissões ao vivo pela internet. Texto Persuasivo diálogos espontâneos, cartas formais, entrevistas pontuação alta na D4, pontuação inexpressiva nas demais Dimensões. Os textos caracterizam-se pela persuasão e/ou argumentação. Testes da confiabilidade para o programa O programa foi testado para a confiabilidade no LOB e no corpus Brown. Estes os resultados são reproduzidos abaixo. Tabela 1 - A análise MAT do corpus do LOB comparado aos resultados de Biber (1988) D1, D2, D3, D4, D5, D6, Reporetiquetaem - MAT -14.02 0.97 2.81 -0.38 0.52 -0.72 59% Texto descritivo; 39% Texto Dissertativo; 2% Texto Persuasivo; 2% Texto Técnico Reporetiquetaem - Biber (1988) -15.01 0.4 -0.3 -0.7 0.6 -0.9 73% de Texto Descritivo; 25% de Texto Dissertativo; 2% de Texto Técnico Diferença 0.99 0.57 3.11 0.32 0.08 0.18 Editoriais de impressa - MAT -8.4 -0.28 4.38 3.3 1.5 0.33 81% de Texto Descritivo; 7% de Texto Persuasivo; 7% de Texto Técnico; 4% de Texto Dissertativo Editoriais de imprensa - Biber (1988) -10 -0.8 1.9 3.1 0.3 1.5 86% de Texto Descritivo; 11% de Texto Persuasivo; 4% de Texto Dissertativo. Diferença 1.6 0.52 2.48 0.2 1.2 1.17 Resenha de imprensa -12.45 -0.74 5.38 -2.32 0.36 -1.01 53% de Exposição narrativa geral; 47% de Exposição instruída Resenha de imprensa - Biber (1988) -13.9 -1.6 4.3 -2.8 0.8 -1 47% de Texto Dissertativo; 47% de Texto Descritivo; 6% de Textp Técnico Diferença 1.45 0.86 1.08 0.48 0.44 0.01 Religião - MAT -4.26 0.17 4.69 0.85 2.22 1.01 65% de Texto Descritivo; 29% de Texto Persuasivo; 6% de Texto Técnico; Religião - Biber (1988) -7 -0.7 3.7 0.2 1.4 1 59% de Texto Descritivo; 18% Texto Persuasivo; 18% Texto Dissertativo 6% de Texto Narrativo Diferença 2.74 0.87 0.99 0.65 0.82 0.01 Hobbies - MAT -9.42 -2.1 3.15 1.51 2.54 -0.35 34% de Texto Descritivo; 24% de Texto Dissertativo; 24% de Texto Persuasivo; 18% de Texto Técnico Hobbies - Biber (1988) -10.1 -2.9 0.3 1.7 1.2 -0.7 43% de Texto Decritivo; 21% de Texto Dissertativo; 21% de Texto Persuasivo; 7% de Texto Técnico; 7% de Reporetiquetaem situacional Diferença 0.68 0.8 2.85 0.19 1.34 0.35 Sabedoria popular - MAT -9.58 0.31 3.42 -0.61 1.4 -0.64 36% de Texto Dissertativo; 32% de Texto Descritivo; 20% de Texto Persuasivo; 2% de Texto Narrativo; 9% de Texto Técnico Sabedoria Popular - Biber (1988) -9.3 -0.1 2.3 -0.3 0.1 -0.8 36% de Texto Dissertativo; 36% de Texto Persuasivo; 21% de Texto Dissertativo; 7% Texto Narrativo Diferença 0.28 0.41 1.12 0.31 1.3 0.16 Linguagem acadêmica - MAT -12.16 -2.16 5.38 -0.02 5.14 0.23 56% de Texto Técnico; 24% de Texto Dissertativo; 14% de Texto Descritivo; 6% de Texto Persuasivo Linguagem Acadêmica - Biber (1988) -14.09 -2.6 4.2 -0.5 5.5 0.5 44% de Texto Técnico; 31% de Texto Dissertativo; 17% de Texto Dissertativo; 9% de Texto Persuasivo Diferença 1.93 0.44 1.18 0.48 0.36 0.27 Ficção - MAT 0.35 6.26 0.03 1.79 -0.45 -0.75 55% de Texto Narrativo; 31% de Texto Descritivo; 10% de Texto Persusasivo; 3% de Texto Dissertativo Ficção - Biber (1988) -0.8 5.9 -3.1 0.9 -2.5 -1.6 51% de Texto Narrativo; 41% de Texto Descritivo; 3% de Diálogos Informais; 3% de Texto Persuasivo Diferença 1.15 0.36 3.13 0.89 2.05 0.85 Suspense - MAT 0.82 5.76 -0.7 1.55 -0.69 -1.13 67% de Texto Narrativo; 29% de Texto Descritivo; 4% de Texto Persuasivo; Mistério - Biber (1988) -0.2 6 -3.6 -0.7 -2.8 -1.9 70% de Texto Narrativo; 23% ; 8% de Texto Descritivo; 8% de Reporetiquetaem Diferença 1.02 0.24 2.9 2.25 2.11 0.77 Ficção científica -5.01 6.1 1.08 0.21 -0.54 -0.54 83% de Texto Descritivo; 17% Texto Narrativo; Ficção científica - Biber (1988) -6.1 5.9 -1.4 -0.7 -2.5 -1.6 50% de Texto Descritivo; 33% de Texto Narrativo; 8% de Reporetiquetaem Diferença 1.09 0.2 2.48 0.91 1.96 1.06 Aventura - MAT -0.85 5.89 -1.29 0.19 -0.97 -1.29 69% de Texto Narrativo; 24% de Texto Descritivo; 3% de Texto Persuasivo; 3% de Texto Dissertativo Aventura - Biber (1988) 0 5.5 -3.8 -1.2 -2.5 -1.9 70% de Texto Narrativo; 31% de Texto Descritivo Diferença 0.85 0.39 2.51 1.39 1.53 0.61 Romance - MAT 3.55 6.71 -0.88 2.35 -1.26 -1 79% de Texto Narrativo; 17% de Texto Descritivo; 3% de Texto Persuasivo; Romance- Biber (1988) 4.3 7.2 -4.1 1.8 -3.1 -1.2 92% de Texto Narrativo; 8% de Texto Descritivo Diferença 0.75 0.49 3.22 0.55 1.84 0.2 Humor - MAT -6.19 1.43 1.62 0.43 0.65 -0.56 78% Texto Descritivo; 11% de Texto Narrativo; 11% de Texto Persuasivo Humor - Biber (1988) -7.8 0.9 -0.8 -0.3 -0.4 -1.5 89% de Texto Descritivo; 11% de Texto Persuasivo Diferença 1.61 0.53 2.42 0.73 1.05 0.94 Os resultados obtidos pelo MAT para cada Dimensão demonstram que o programa reproduz de maneira satisfatória a análise de Biber (1988). Para Dimensão 1 a diferença varia entre 0,28 para Sabedoria Popular até 2,74 para a Religião. Porém, considerando a extensão da Dimensão 1 é possível classificar corretamente um texto numa área, mesmo com uma diferença de 3 pontos. Para Dimensão 1 a diferença varia entre 0,2 para Ficção Científica até 0,87 para Religião. Esta diferença de menos de um ponto não é significativa em termos de tipologia e / ou localização do texto (s) na Dimensão 2 Para Dimensão 3 a pontuação varia de 0,99 para Religião até 3,22 para Ficção Científica. As diferenças do valor 2 ou mais podem afetar a confiabilidade dos resultados da Dimensão 3, pois a variação desta dimensão é pequena. Para Dimensão 4 a pontuação varia de 0,19 para Hobbies até 2,25 para Ficção Científica. Com exceção deste valor, todos os outros valores indicam que não há diferença expressiva entre os resultados de Biber (1988) e os do MAT. Para Dimensão 5 a pontuação varia de 0,08 para Reporetiquetaem Situacional até 2,11 para Ficção Científica. Com exceção deste valor, todos os outros valores indicam que não há diferença expressiva entre os resultados de Biber (1988) e os do MAT. Finalmente, para Dimension 6 a pontuação varia de 0,01 para Resenhas de Imprensa e Religião até 1,06 para Ficção Científica. Portanto, não há diferença expressiva entre os resultados de Biber (1988) e os do MAT. Dessa maneira, podemos concluir que o MAT reproduz de maneira satisfatória o modelo de Biber (1988). Apenas a Dimensão 3 apresentou resultados anômalos. Uma exploração dos z-scores apontou que as notas produzidas pelo MAT para a Dimensão 3 são inflados por causa dos altos escores z de advérbios gerais. No entanto, para esta fase nenhuma causa foi individualizada como responsáveis por essa variação. Até que o problema seja resolvido, a Dimension 3 pontos produzidos por MAT deve ser tratada com cautela. Embora as diferenças de dimensão 3 são moderados, estes não influenciam a atribuição do tipo de texto em muitos casos, uma vez que a maioria dos gêneros são não marcado para Dimensão 3. A atribuição de tipos de texto dadas pelo MAT são geralmente precisos com algumas pequenas imprecisões, provavelmente causadas pelas pequenas diferenças entre os dicionários ou regras empregadas por Stanford Etiquetador e do etiquetador utilizado em Biber (1988). Outro teste foi executado para o corpus Brown e os resultados são apresentados abaixo. Tabela 1 - A análise MAT do corpus do LOB comparado aos resultados de Biber (1988) D1, D2, D3, D4, D5, D6, Reporetiquetaem - MAT -17.61 0.09 4.51 -1.55 0.85 -1.11 75% de Texto Dissertativo; 20% de Texto Descritivo; 4% de Texto Técnico Reporetiquetaem Biber (1988) -15.01 0.4 -0.3 -0.7 0.6 -0.9 73% Texto Expositivo; 25% de Texto Dissertativo; 2% de Texto Dissertativo Diferença 2.6 0.31 4.81 0.85 0.25 0.21 Editoriais - MAT -10.71 -0.59 4.5 1.39 0.63 -0.28 63% de Texto Descritivo; 7% de Texto Persuasivo; 7% de Texto Dissertativo; Editoriais - Biber (1988) -10 -0.8 1.9 3.1 0.3 1.5 86% de Texto Decritivo; 11% de Texto Persuasivo; 4% de Texto Dissertativo Diferença 0.71 0.21 2.6 1.71 0.33 1.78 Resenha de imprensa -13.83 -1.32 5.27 -3.31 0.41 -1.08 59% de Texto Dissertativo; 41% de Texto Descritivo Resenha de Imprensa - Biber (1988). -13.9 -1.6 4.3 -2.8 0.8 -1 47% de Texto Dissertativo; 47% de Texto Descritivo; 6% de Texto Técnico Diferença 0.07 0.28 0.97 0.51 0.39 0.08 Religião - MAT -7.17 -0.11 5.1 0.39 2.11 0.49 35% de Texto Descritivo; 29% de Texto Persuasivo; 24% de Texto Dissertativo; Religião - Biber (1988) -7 -0.7 3.7 0.2 1.4 1 59% de Texto Descritivo; 18% de Texto Persuasivo; 18% Texto Dissertativo; 6% Texto Narrativo Diferença 0.17 0.59 1.4 0.19 0.71 0.51 Hobbies - MAT -12.44 -2.66 4.47 -0.86 1.34 -1.15 50% de Texto Dissertativo; 36%; Texto Descritivo; 6% de Texto Persuasivo; 8% de Texto Técnico Hobbies - Biber (1988) -10.1 -2.9 0.3 1.7 1.2 -0.7 43% de Texto Descritivo; 21% Texto Dissertativo; 21% de Texto Persuasivo; 7% de Texto Técnio; 7% Reporetiquetaem Situacional Diferença 2.34 0.24 4.17 2.56 0.14 0.45 Sabedoria Popular - MAT -13.3 -0.1 3.9 -1.03 1.38 -0.67 44% de Texto Dissertativo ; 42% deTexto Descritivo ; 8% de Texto Persuasivo ; 6% de Texto Técnico Sabedoria Popular - Biber (1988). -9.3 -0.1 2.3 -0.3 0.1 -0.8 36% de Texto Dissertativo; 36% de Texto Persuasivo; 21% de Texto Descritivo; 7% de Texto Narrativo Diferença 4 0 1.6 0.73 1.28 0.13 Texto Acadêmico MAT. -13.58 -2.33 5.93 -0.88 4.48 0.01 38% de Texto Técnico; 38% de Texto Dissertativo; 23% de Texto Descritivo; 3% de Texto Persuasivo Textos Acadêmicos Biber (1988) -14.09 -2.6 4.2 -0.5 5.5 0.5 44% de Texto Técnico; 31% de Texto Dissertativo; 17% de Texto Descritivo; 9% de Texto Persuasivo Diferença 0.51 0.27 1.73 0.38 1.02 0.49 Ficção - MAT -5.83 5.86 0.19 -0.33 -0.44 -1.22 66% de Texto Narrativo; 24% de Texto Descritivo; 10% de Texto Persuasivo; Ficção - Biber 51% de Texto Narrativo; 41% de Texto Descritivo; 3% de Diálogos Informais; 1988). -0.8 5.9 -3.1 0.9 -2.5 -1.6 Persuasão Diferença 5.03 0.04 3.29 1.23 2.06 0.38 Mistério - MAT -2.21 5.57 -1.22 0.13 -1.03 -1 46% de Texto Narrativo; 42% de Texto Narrativo; 13% de Texto Persuasivo; Mistério - Biber (1988). -0.2 6 -3.6 -0.7 -2.8 -1.9 70% de Texto Narrativo; 23% de Texto Descritivo; 8% de Reporetiquetaem Diferença 2.01 0.43 2.38 0.83 1.77 0.9 Ficção Científica -4.1 4.79 1.3 0.12 0.79 -0.78 50% de Texto Descritivo; 17% de Texto Narrativo; 17% Texto Persuasivo; 17% Texto Dissertativo Ficção Científica - Biber (1988). -6.1 5.9 -1.4 -0.7 -2.5 -1.6 50% de Texto Descritivo; 33% de Texto Narrativo; 17% de Reporetiquetaem Diferença 2 1.11 2.7 0.82 3.29 0.82 Aventura - MAT MAT. -6.05 5.88 -0.81 -1.78 -1.05 -1.39 66% de Texto Descritivo; 31% de Texto Narrativo; 3% Texto Dissertativo Aventura - MAT Biber (1988) 0 5.5 -3.8 -1.2 -2.5 -1.9 70% de Texto Narrativo; 31% de Texto Descritivo Diferença 6.05 0.38 2.99 -0.58 1.45 0.51 Romance - MAT MAT. 0.83 6.02 0.41 -0.08 -1.15 -1.08 59% de Texto Narrativo; 31% de Texto Descritivo; 10% de Texto Persusasivo; Gênero Romântico - MAT Biber (1988) 4.3 7.2 -4.1 1.8 -3.1 -1.2 92% de Texto Narrativo; 8% de Texto Descritivo Diferença 3.47 1.18 4.51 1.88 1.95 0.12 Humor - MAT -6.76 2.96 2.56 -1.16 0.42 -0.46 67% de Texto Descritivo; 22% de Texto Narrativo; 11% de Texto Dissertativo Humor - Biber (1988). -7.8 0.9 -0.8 -0.3 -0.4 -1.5 89% de Texto Descritivo; 11% de Texto Persuasivo Diferença 1.04 2.06 3.36 0.86 0.82 1.04 Há diferenças consideráveis entre a pontuação do MAT e a pontuação de Biber (1988). No entanto, como o corpus de Brown contém gêneros idênticos, e diferentes textos do corpus LOB, os resultados obtidos a partir da análise do corpus de Brown sugerem que as dimensões criadas por Biber (1988) ainda são válidas para aqueles gêneros, mesmo com textos diferentes. os resultados da última pesquisa são encorajadores, e sugerem que o MAT deve ser aplicado em textos, utilizando a pontuação das Dimensões de Biber (1988). Além disso, o MAT classifica um texto conforme suas características, tal como proposto por Biber (1989). Lista de variáveis Cada variável recebeu uma pequena descrição. Próximo ao nome da variável está a etiqueta utilizada para identificar cada uma delas. Um asterisco é exibido ao lado do nome das variáveis, cujos resultados foram conferidos manualmente por Biber (1988). Esta versão do etiquetador não permite qualquer intervenção manual no processo de marcação. Entretanto, os textos podem ser conferidos manualmente antes que a análise ocorra. AMP: Adjunto adverbial de intensidade Esta etiqueta encontra qualquer um dos itens desta lista: absolutely, altogether, completely. enormously, entirely, extremely, fully, greatly, highly, intensely, perfectly, strongly, thoroughly, totally, utterly, very. ANDC: Oração coordenada sindética Esta etiqueta é atribuída à palavra em um dos seguintes padrões: (1) precedida de vírgula e seguida de it, so, then, there + Be, ou um pronome demonstrativo (DEMP) ou um pronome pessoal como sujeito; (2) precedida de pontuação; (3) seguida de um WH pronoun ou qualquer WH word, uma conjunção adverbial (CAUS, COND, OSUB), uma partícula do discurso (DPAR) ou uma conjunção (CONJ). AWL: Extensão das palavras Extensão das palavras no texto em letras. Uma palavra é qualquer sequência de caracteres separada por espaço no texto reconhecida pelo Etiquetador de Stanford. BEMA: Be como verbo principal Be é marcado como sendo um verbo principal no seguinte padrão: Be seguido por um determiner (DT), um pronome possessivo (PRP$), uma preposição (PIN,) ou um adjetivo (JJ). Este algoritmo foi desenvovido por este etiquetador, considerando que advérbios ou negativas podem aparecer entre o verbo ser e o resto do padrão. Além disso, o algoritmo foi sensivelmente modificado e melhorado: (a) o problema da dupla marcação da palavra there seguida de Be como BEMA foi solucionado com a condição de que there não apareça antes do padrão; (b) a etiqueta de números cardinais (CD) e a de pronomes pessoais (PRP) foram colocadas na lista das palavras que ocorrem após o verbo Be. BYPA: Voz Passiva com By Esta etiqueta é escolhida quando é encontrado o padrão de Voz Passiva (PASS) e a preposição by o segue. CAUS: Conjunção Subordinativa Causiva Esta etiqueta identifica qualquer ocorrência da palavra because. CONC: Conjunção Subordinativa Concessiva Esta etiqueta identifica qualquer ocorrência das palavras although e though. O algoritmo de Biber foi aperfeiçoado com o acréscimo da abreviatura tho. COND: Conjunção Subordinativa Condicional Esta etiqueta identifica qualquer ocorrência das palavras if e unless. CONJ: Conectivos Esta etiqueta encontra qualquer uma destas palavras: pontuação + else, pontuação + altogether, pontuação + rather, alternatively, consequently, conversely, por exemplo, furthermore, hence, however, por exemplo instead, likewise, moreover, namely, nevertheless, nonetheless, notwithstanding, otherwise, similarly, therefore, thus, viz, in comparison, in contrast, in particular, in addition, in conclusion, in consequence, in sum, in summary, for example, for instance, instead of, by contrast, by comparison, in any event, in any case, in other words, as a result, as a consequence, on the contrary, on the other hand. Algumas correções foram feitas na lista acima. Por exemplo, há redundância quando Biber lista a palavra rather duas vezes. Rather foi contabilizada apenas quando aparece após um sinal de pontuação. O mesmo se aplica para altogether. Em casos de unidades com várias palavras, por exemplo on the other hand, somente a primeira palavra é marcada como OSUB; as outras palavras são marcadas com a etiqueta NULL. CONT: Abreviatura As abreviaturas foram marcadas através da identificação de apóstrofo seguido de uma palavra marcada ou do item n' t. DEMO: Demonstratives Um Demonstrative é encontrado quando as palavras that, this, these, those não são classificadas como DEMP, TOBJ, TSUB, THAC ou THVC. DEMP: Pronome Demonstrativo * O programa classifica como pronomes demonstrativos as palavras those, these quando estas são seguidas de um verbo (qualquer etiqueta que comece co V) ou um verbo auxiliar (modais na forma de etiquetas MD, formas do verbo DO, formas do verbo HAVE, formas do verbo Be, sinais de pontuação, um WH pronoun ou a palavra and. A palavra that é classificada como um pronome demonstrativo, quando segue o padrão acima, ou quando é seguida por `s (ou is), and, at the same time. That ainda não foi classificada como TOBJ, TSUB, THAC ou THVC. DPAR: Partícula do Discurso O programa classifica como parte do discurso as palavras well, now, anyhow, anyways precedidas por um sinal de pontuação. DWNT: Advérbio de Grau Este etiqueta encontra qualquer uma das palavras desta lista: almost, barely, hardly, merely, mildly, nearly, only, partially, partly, practically, scarcely, slightly, somewhat. A palavra almost foi classificada por Biber como sendo tanto hedge e um advérbio de modo. Neste etiquetador almost é considerado apenas um advérbio de grau. EMPH: Ênfase Nesta lista estão classificadas as palavras: just, really, most, more, real+adjetivo, so+adjetivo, DO em qualquer forma, for sure, a lot, such a. Em casos de unidades com várias palavras, por exemplo on the other hand, somente a primeira palavra é classificada como OSUB; as outras palavras são classificadas como a etiqueta NULL. EX: Verbo There is/are O verbo there is/are é classificado como EX pelo Etiquetador de Stanford. Leia mais em http://catalog.ldc.upenn.edu/docs/LDC99T42/etiquetaguid1.pdf). FPP1: Pronome da 1ª Pessoa Qualquer um dos itens dessa lista: I, me, us, my, we, our, myself, ourselves. GER: Gerúndio* O programa classifica como gerúndios qualquer forma nominal (N) que termina em – ing ou -ings. Para que haja precisão, somente palavras de mais de 10 caracteres são consideradas como gerúndio. HDG: Hedges Esta etiqueta encontra qualquer um dos itens nesta lista: maybe, at about, something like, more or less, sort of, kind of ( os últimos itens deverão estar precedidos de um determinante (DT), um quantifier (QUAN), um número cardinal (CD), um adjetivo (JJ ou PRED) , um pronome possessivo (PRP$) ou palavra com WH (ver seção sobre WH-questions). Em casos de unidades de sentido com várias palavras, por exemplo more or less, somente a primeira palavra é classificada como HDG, as outras palavras serão classificadas como etiqueta NULL. INPR: Pronome Indefinido Os ítens desta lista: anybody, anyone, anything, everybody, everyone, everything, nobody, none, nothing, nowhere, somebody, something. JJ: Adjectivo de Atributo (ex. the big horse) Biber (1988) especifica que os Attributive adjective são aqueles seguidos de um outro adjetivo ou um substantivo. No entanto, Biber afirma que aqueles adjetivos não classificados como predicativos foram relacionados como attributive adjective. Assim, o etiquetador não tem um algoritmo para classificar os Attributive ajectives. Todos os adjetivos que o etiquetador de Stanford já classificou como JJ, JJS ou JJR são considerados adjetives de atributo e são todos recolocados na etiqueta JJ. Os adjectivos de predicativo são classificados por outro algoritmo e, portanto, são diferentes do primeiro. NEMD: Modais de Necessidade Os modais de necessidade listados por Biber (1988) são: ought, shoud, must. NN: Outros Substantivos Qualquer substantivo classificado como NN pelo Etiquetador Stanford, que não foi identificado como nominalização ou gerúndio, ficará com aquela classificação. As etiquetas de substantivos no plural (NNS) e nomes próprios (NNP e NNPS) são alterados para NN e incluídas nesta lista. NOMZ: Substantivação Qualquer substantivo terminados em -tion, -ment, -ness, ou -ity, assim como os plurais destes. Apesar de Biber (1988) não mencionar que esta variável tenha sido conferida manualmente, provavelmente uma stop list foi utilizada para evitar erros de classificação (ex. ciy). Contudo, isto não foi indicado no apêndice do Biber (1988). OSUB: Outras Conjunções Subordinativas Esta etiqueta identifica qualquer ocorrência das palavras: since, while, whilst, whereupon, whereas, whereby, such that, so that (seguidas de palavras diferentes de substantivos e adjetivos), such that (seguida de palavras diferentes de substantivos e adjetivos), inasmuch as, forasmuch as, insofar as, insomuch as, as long as, as soon as. Em casos de unidades de sentido com várias palavras, por exemplo as long as, somente a primeira palavra é classificada como OSUB; as outras palavras são classificadas como a etiqueta NULL. Outras etiquetas de Stanford Se o usuário seleciona "todas as etiquetas" da janela principal então todas as etiquetas atribuídas pelo etiquetador de Stanford são listadas também. Uma lista das etiquetas de Stanford e a descrição de como elas são identificadas pode ser encontrada aqui: http://Catalog.LDC.upenn.edu/docs/LDC99T42/etiquetaguid1.pdf PASS: Agentless passives Esta etiqueta é atribuída à palavra quando encontra-se em um dos seguintes padrões: (a) qualquer forma verbal do Be seguida de um particípio (VBN ou VBD) mais um ou dois advérbios (RB) ou negativas; (b) qualquer forma verbal do Be seguida de uma forma nominal ( um substantivo, NN, NNP ou pronome pessoal, PRP) e um particípio (VBN ou VBD). Houve algumas alterações no algoritmo da versão de Biber neste etiquetador. Considerou-se necessário implementar a possibilidade de uma negação no padrão (b). Portanto, esta etiqueta é atribuída também nos casos em que uma negação precede a forma nominal do padrão (b). PASTP: Past participial clauses* (por exemplo, Built in a single week, the house would stand for fifty years) Esta etiqueta é atribuída ao seguinte padrão: um sinal de pontuação seguido de um particípio passado (VBN) seguido por uma preposição (PIN) ou de um advérbio (RB). PEAS: Verbo com “Perfect” Este cálculo é feito com o número de ocorrências do HAVE seguido da etiqueta VBD ou VBN (um particípio ou particípio passado de qualquer verbo). Estas também são contadas quando um advérbio (RB) ou negativa (XX0) ocorre entre as duas palavras. A versão interrogativa também é contada. Este cálculo é feito com o número de ocorrências do HAVE seguido de uma forma nominal (substantivo, NN, substantivo próprio, NP ou pronome pessoal, PRP) e em seguida, seguido por uma etiqueta VBN ou VBN. Quanto a versão afirmativa, o último algoritmo também é responsável por advérbios intermediárias ou negações. PHC: Oração Coordenada Esta etiqueta foi atribuída para any e that, é precedido e seguido pela mesma etiqueta, e quando esta etiqueta é uma etiqueta de advérbio, ou uma etiqueta de adjetivo, ou etiqueta de um verbo ou uma etiqueta de substantivo. PIN: Total prepositional phrases Esta etiqueta identifica qualquer ocorrência das preposições listadas por Biber (1988). Conforme descrito na seção sobre o infinitivo, foi feita a distinção entre a preposição to e o to como marca do infinitivo. Biber (1988) não especifica se ele incluiu qualquer exemplo da palavra to ou se foi feita a distinção entre os dois casos. Contudo, foi necessário a distinção entre as duas palavras neste etiquetador, para um resultado mais preciso. PIRE: Pied-piping Relative Clauses (Ex.: the manner in which he was told) Esta etiqueta classifica o seguinte padrão: qualquer preposição (PIN) seguida de who, whose or which. PIT: Pronome “it” Quando aparece o pronome it. Apesar de não ser mencionado por Biber (1988), este programa também classifica os pronomes its e itself como “Pronome it”. PLACE: Advérbio de Lugar Qualquer um dos itens da lista: aboard, above, abroad, across, ahead, alongside, around, ashore, astern, away, behind, below, beneath, beside, downhill, downstairs, downstream, east, far, hereabouts, indoors, inland, inshore, inside, locally, near, nearby, north, nowhere, outdoors, outside, overboard, overland, overseas, south, underfoot, underground, underneath, uphill, upstairs, upstream, west. Se algum item for classificado pelo Etiquetador de Standford como nome próprio (NNP), ele não será classisficado como advérbio de lugar. POMD: Possibility Modals The possibility modals listed by Biber (1988): can, may, might, could. PRED: Adjetivo Predicativo (Ex.: the horse is big) Esta classificação (PRED) engloba os adjetivos do seguinte padrão: qualquer forma do verbo Be seguida de um adjetivo (JJ), e seguida de outra palavra diferente de um adjetivo, um advérbio (RB) ou um substantivo (N). Tal classificação é válida, mesmo que um advérbio ou negativa esteja entre o adjetivo e a outra palavra. Para um resultado mais preciso, foi feita uma alteração no algoritmo de Biber. Um adjetivo é chamado de predicativo, se é precedido por outro adjetivo predicativo seguido de um phrasal coordinator (veja abaixo). Um exemplo deste padrão é a frase: the horse is big and fast. PRESP: Oração de Particípio Presente* (Ex.: Stuffing his mouth with cookies, Joe ran out the door) Nesta etiqueta é encontrado o padrão referente à pontuação seguida do particípio presente de um verbo (VBG) seguido de preposição (PIN), um determiner (DT) (QUAN, CD), um pronome WH, um pronome possessivo WH (WP$), qualquer palavra WH, qualquer pronome (PRP) ou qualquer advérbio (RB). PRIV: Verbs Privativos Nesta etiqueta estão classificados os verbos mencionados por Quirk et al. (1985: 1181-2): accept, accepts, accepting, accepted, anticipate, anticipates, anticipating, anticipated, ascertain, ascertains, ascertaining, ascertained, assume, assumes, assuming, assumed, believe, believes, believing, believed, calculate, calculates, calculating, calculated, check, checks, checking, checked, conclude, concludes, concluding, concluded, conjecture, conjectures, conjecturing, conjectured, consider, considers, considering, considered, decide, decides, deciding, decided, deduce, deduces, deducing, deduced, deem, deems, deeming, deemed, demonstrate, demonstrates, demonstrating, demonstrated, determine, determines, determining, determined, discern, discerns, discerning, discerned, discover, discovers, discovering, discovered, doubt, doubts, doubting, doubted, dream, dreams, dreaming, dreamt, dreamed, ensure, ensures, ensuring, ensured, establish, establishes, establishing, established, estimate, estimates, estimating, estimated, expect, expects, expecting, expected, fancy, fancies, fancying, fancied, fear, fears, fearing, feared, feel, feels, feeling, felt, find, finds, finding, found, foresee, foresees, foreseeing, foresaw, forget, forgets, forgetting, forgot, forgotten, gather, gathers, gathering, gathered, guess, guesses, guessing, guessed, hear, hears, hearing, heard, hold, holds, holding, held, hope, hopes, hoping, hoped, imagine, imagines, imagining, imagined, imply, implies, implying, implied, indicate, indicates, indicating, indicated, infer, infers, inferring, inferred, insure, insures, insuring, insured, judge, judges, judging, judged, know, knows, knowing, knew, known, learn, learns, learning, learnt, learned, mean, means, meaning, meant, note, notes, noting, noted, notice, notices, noticing, noticed, observe, observes, observing, observed, perceive, perceives, perceiving, perceived, presume, presumes, presuming, presumed, presuppose, presupposes, presupposing, presupposed, pretend, pretend, pretending, pretended, prove, proves, proving, proved, realize, realise, realising, realizing, realises, realizes, realised, realized, reason, reasons, reasoning, reasoned, recall, recalls, recalling, recalled, reckon, reckons, reckoning, reckoned, recognize, recognise, recognizes, recognises, recognizing, recognising, recognized, recognised, reflect, reflects, reflecting, reflected, remember, remembers, remembering, remembered, reveal, reveals, revealing, revealed, see, sees, seeing, saw, seen, sense, senses, sensing, sensed, show, shows, showing, showed, shown, signify, signifies, signifying, signified, suppose, supposes, supposing, supposed, suspect, suspects, suspecting, suspected, think, thinks, thinking, thought, understand, understands, understanding, understood. PRMD: Predictive modals Aqui estão listados os “predictive modals” de Biber (1988): will, would, shall e suas abreviaturas: ‘d_MD, ll_MD, wo_MD, sha_MD. PROD: Pro-verb “do” Qualquer forma do verbo “DO” quando como verbo principal e, por isso, se exclui o “DO” como verbo auxiliar. O etiquetador classifica como PROD qualquer “DO” fora destes padrões: (a) “DO” seguido de um verbo (qualquer etiqueta que comece com V) ou seguido de advérbios (RB), negativas e um verbo (V); (b) “DO” precedido de pontuação ou um pronome WH (Em Biber (1988) podemos ver a lista de pronomes WH). PUBV: Public Verbs Esta etiqueta localiza quaisquer itens da lista de Quirk et al. (1985: 1180–1): acknowledge, acknowledged, acknowledges, acknowledging, add, adds, adding, added, admit, admits, admitting, admitted, affirm, affirms, affirming, affirmed, agree, agrees, agreeing, agreed, allege, alleges, alleging, alleged, announce, announces, announcing, announced, argue, argues, arguing, argued, assert, asserts, asserting, asserted, bet, bets, betting, boast, boasts, boasting, boasted, certify, certifies, certifying, certified, claim, claims, claiming, claimed, comment, comments, commenting, commented, complain, complains, complaining, complained, concede, concedes, conceding, conceded, confess, confesses, confessing, confessed, confide, confides, confiding, confided, confirm, confirms, confirming, confirmed, contend, contends, contending, contended, convey, conveys, conveying, conveyed, declare, declares, declaring, declared, deny, denies, denying, denied, disclose, discloses, disclosing, disclosed, exclaim, exclaims, exclaiming, exclaimed, explain, explains, explaining, explained, forecast, forecasts, forecasting, forecasted, foretell, foretells, foretelling, foretold, guarantee, guarantees, guaranteeing, guaranteed, hint, hints, hinting, hinted, insist, insists, insisting, insisted, maintain, maintains, maintaining, maintained, mention, mentions, mentioning, mentioned, object, objects, objecting, objected, predict, predicts, predicting, predicted, proclaim, proclaims, proclaiming, proclaimed, promise, promises, promising, promised, pronounce, pronounces, pronouncing, pronounced, prophesy, prophesies, prophesying, prophesied, protest, protests, protesting, protested, remark, remarks, remarking, remarked, repeat, repeats, repeating, repeated, reply, replies, replying, replied, report, reports, reporting, reported, say, says, saying, said, state, states, stating, stated, submit, submits, submitting, submitted, suggest, suggests, suggesting, suggested, swear, swears, swearing, swore, sworn, testify, testifies, testifying, testified, vow, vows, vowing, vowed, warn, warns, warning, warned, write, writes, writing, wrote, written. RB: Advérbios Todos os advérbios que o Etiquetador Stanford classificou como RB, RBS. RBR ou WRB são todos recolocados na etiqueta RB para garantir uma conetiquetaem final do total de advérbios (para ler mais: reference: http://catalog.ldc.upenn.edu/docs/LDC99T42/etiquetaguid1.pdf). SERE: Orações Adjetivas* (Ex.: Bob likes fried mangoes, which is disgusting) Uma oração adjetiva é classificada sempre quando há pontuação seguida da palavra which. SMP: Seem\|appear Quando há ocorrência de qualquer forma dos verbos seem and appear. SPAU: Auxiliares Split (Ex.: they are objectively shown that...) Auxiliares Split são identificados sempre que um auxiliar (qualquer modal MD, ou qualquer forma de DO, BE ou HAVE) é seguida de um ou dois advérvios e uma forma verbal. SPIN: Split infinitives (e.g. he wants to convincingly prove that…) Infinitivo Split é identificado sempre que a partícula de infinitive to é seguida de um ou dois advérbios e uma forma verbal. SPP2: Pronomes da 2ª Pessoa Qualquer um dos itens desta lista: you, your, yourself, youselves, thy, thee, thyself, thou. STPR: Preposição Stranded (Ex.: the candidate that I was thinking of) A preposição Stranded é identificada sempre que há uma preposição seguida de pontuação. Entretanto, uma adaptação foi feita: a preposição não pode ser besides, porque esta palavra pode também ser um conectivo que vem, geralmente, seguida de pontuação. SUAV: Verbos de Persuasão Esta etiqueta localiza os seguintes itens listados por Quiik et al. (1985: 1182–3): agree, agrees, agreeing, agreed, allow, allows, allowing, allowed, arrange, arranges, arranging, arranged, ask, asks, asking, asked, beg, begs, begging, begged, command, commands, commanding, commanded, concede, concedes, conceding, conceded, decide, decides, deciding, decided, decree, decrees, decreeing, decreed, demand, demands, demanding, demanded, desire, desires, desiring, desired, determine, determines, determining, determined, enjoin, enjoins, enjoining, enjoined, ensure, ensures, ensuring, ensured, entreat, entreats, entreating, entreated, grant, grants, granting, granted, insist, insists, insisting, insisted, instruct, instructs, instructing, instructed, intend, intends, intending, intended, move, moves, moving, moved, ordain, ordains, ordaining, ordained, order, orders, ordering, ordered, pledge, pledges, pledging, pledged, pray, prays, praying, prayed, prefer, prefers, preferring, preferred, pronounce, pronounces, pronouncing, pronounced, propose, proposes, proposing, proposed, recommend, recommends, recommending, recommended, request, requests, requesting, requested, require, requires, requiring, required, resolve, resolves, resolving, resolved, rule, rules, ruling, ruled, stipulate, stipulates, stipulating, stipulated, suggest, suggests, suggesting, suggested, urge, urges, urging, urged, vote, votes, voting, voted, SYNE: Negativa Sintética Este padrão foi identificado como uma negativa sintética: “no” seguido de um adjetivo (JJ e PRED) e um substantivo comun ou um próprio. As palavras neither e nor também foram classificadas como negativa sintética. THAC: Comlementos Adjetivos - that* O program classifica como THAC a palavra that precedida de um adjetivo (JJ ou um adjetivo de predicado, PRED). THATD: Supressão da Conjunção Subordinativa That A etiqueta THAD é utilizada quando há um dos seguintes padrões: (1) um verbo jurídico, privativo ou persuasivo seguido de um pronome demonstrativo (DEMP) ou um pronome pessoal na função de sujeito; (2) um verbo jurídico, privativo ou persuasivo seguido de um pronome (PRP) ou um substantivo (N) e um verbo (V) ou auxiliary de advérbio (RB), um determiner (DT, QUAN, CD) ou um pronome possessive (PRP$) e um substantive (N) e mais um verbo ou verbo auxiliar. Ainda pode haver um adjetivo (JJ ou PRED) entre o substantivo e a palavra anterior. THVC: That verb complementsThat como Complemento de Verbo Esta etiqueta é utilizada quando a palavra that está: (1) precedida por and, nor, but, or, also ou qualquer sinal de pontuação e seguida de um determiner (DT, QUAN, CD), um pronome (PRP), there, um substantivo no plural (NNS) ou um substantivo próprio (NNP); (2) precedido de um verbo jurídico, privativo ou persuasivo ou uma forma do seem ou appear e seguido de outra palavra que NÃO seja um verbo (V), verbo auxiliar (MD, forma do DO, forma do HAVE, forma do BE), sinal de pontuação ou a palavra and; (3) precedido de um verbo jurídico, privativo ou persuasivo ou uma forma de seem ou appear e uma preposição e de até 4 palavras que não sejam substantivos (N). TIME: Advérbio de Tempo Constam desta lista os ítens: afterwards, again, earlier, early, eventually, formerly, immediately, initially, instantly, late, lately, later, momentarily, now, nowadays, once, originally, presently, previously, recently, shortly, simultaneously, subsequently, today, to-day, tomorrow, to-morrow, tonight, to-night, yesterday. A lista que aparece em Biber (188) foi aprimorada com o acréscimo da palavra soon. Esta não é classificada como advérbio de tempo quando é seguida de as. Além disso, a ortografia antiga destes advérbios iniciando em to- foi acrescida ao texto (ex.: to-morrow). TO: Infinitivo A classificação para os infinitivos é a etiqueta Treebank de Stanford TO. O Etiquetador Stanford não faz distinção entre a palavra to como marca do infinitivo ou como uma preposição. Por isto, foi feito um algoritmo para identificar exemplos de to nesta categoria. Este algoritmo localiza as ocorrências de to seguida de uma conjunção subordinativa (IN), um número cardinal (CD), a determiner (DT), um adjetivo (JJ), um pronome possessivo (PRP$), palavras WH (WP$,WDT,WP,WRB), um pre-determiner (PDT), um substantivo (N, NNP, NP, NPs), ou um pronome (PRP) e classifica a palavra como preposição. Outros exemplos com to são considerados marcadores de infinitivo, e portanto, demonstram que se trata de orações de infinitivo. TOBJ: Orações Adjetivas com That como Objeto (Ex.: the dog that I saw) Nesta categoria a palavra that aparece precedida por um substantivo e seguida de um determiner (DT, QUAN, CD), um pronome pessoal na função de sujeito, um pronome possessive (PRP$), o pronome it, um adjetivo (JJ), um substantivo no plural (NNS), um substantivo próprio (NNP) ou um substantivo ou um possessive noun (um substantivo (N) seguido do caso genitivo (POS)). Contudo, conforme citado por Biber, este algoritmo não faz distinção entre complementos comuns de substantivos e orações adjetivas. TPP3: Pronomes da 3ª Pessoa Estão inclusos nesta categoria: she, he, they, her, him, them, his, their, himself, herself, themselves. TSUB: Orações Adjetivas com That como Sujeito* (Ex:. the dog that bit me) São ocorrências do that precedido de um substantivo (N) e seguido de um verbo auxilar ou um verbo (V), podendo haver interferência de um advérbio (RB) ou negativa (XXO). TTR: Type-token ratio Razão Tipo-Símbolo Em Biber (1988), o etiquetador considerou apenas os 400 símbolos iniciais e contou quantos tipos presentes nestes 400 símbolos. Então, o resultado foi correspondente aos tipos nas primeiras 400 palavras do texto. Um texto com menos de 400 símbolos, então, ficaria Fora da análise.O número 400 foi supostamente escolhido por Biber por oferecer um meio termo entre precisão e o número de textos possíveis de se classificar. Como este etiquetador pode ser utilizado com textos de extensão diferentes, o usuário decidirá sobre este número. Assim, o etiquetador pedirá o número antes do início da classificação. Ele contará quantos tipos há nos primeiros números X de símbolos fornecidos pelo usuário. Caso o texto seja menor que X, o programa contará os tipos para o texto todo. O usuário poderá escolher o número baseado no texto com menor número de palavras do corpus ou no modo de estatística do número de símbolos no corpus todo. Por convenção, este número será 400. A variável razão tipo-símbolo só será incluída no cálculo da Dimensão 1 se o usuário mudar o este número. Isto foi feito para manter a compatibilidade com os cálculos de Biber (1988). VBD: O Tempo Pretérito A etiqueta do Etiquetador Stanford VBD é utilizado para esta variável (leia mais em:) http://catalog.ldc.upenn.edu/docs/LDC99T42/etiquetaguid1.pdf). VPRT: O Tempo Presente Qualquer verbo classificado pelo Etiquetador Stanford como VBP ou VBZ (tempo presente ou 3ª pessoa no presente) será classificado, neste etiquetador, como VPRT (leia mais em:) http://catalog.ldc.upenn.edu/docs/LDC99T42/etiquetaguid1.pdf). WHCL: Orações com WH (Ex.: I believed what he told me) Esta etiqueta é utilizada quando o seguinte padrão aparece: qualquer verbo jurídico, privativo ou persuasivo seguido de uma palavra WH, seguido de uma palavra que não seja um auxiliar (etiqueta MD para modais, ou uma forma de DO, ou uma forma de HAVE, ou uma forma de BE). WHOBJ: Orações Adjetivas com WH na função de Objeto (Ex.: the man who Sally likes) Esta etiqueta é aplicada quando o seguinte padrão é identificado: qualquer palavra que não seja uma forma de ASK ou TELL seguida de qualquer palavra, seguida de um substantivo (N), seguida de qualquer palavra que não seja um advérbio (RB), uma negativa (XXO), um verbo ou auxiliar de verbo (MD, as formas de HAVE, BE ou DO). WHQU: Perguntas Diretas com WH Qualquer sinal de pontuação seguida de uma palavra com WH (what, where, when, how, whether, why, whoever, whomever, whichever, wherever, whenever, whatever, however) e seguida de um verbo auxiliar (modal na forma de etiqueta MD ou as formas de DO, formas de HAVE, formas de BE). Este algoritmo foi sensivelmente alterado com a adição de uma palavra entre o sinal de pontuação e a palavra com WH. Assim, as perguntas com WH que contém marcadores de discurso tais como ‘so’ ou ‘anyways’ serão identificadas. Além disso, o algoritmo de Biber foi aprimorado com a supressão de however e whatever, pois estas palavras não iniciam perguntas com WH. WHSUB: Orações Adjetivas com WH na função de Sujeito (Ex:. the man who likes popcorn) Esta etiqueta é aplicada quando o seguinte padrão é identificado: qualquer palavra que não seja uma forma dos verbos ASK ou TELL seguidos de um substantivo (N) e, logo após um pronome com WH, e então, de qualquer verbo ou verbo auxiliar (V), podendo haver um advérbio (RB) ou negativa (XXO) entre o pronome com WH e o verbo. WZPAST: Past participial WHIZ deletion relatives* (Ex.: The solution produced by this process) Esta etiqueta é aplicada quando o seguinte padrão é identificado: um substantivo (N) ou pronome quantifier (QUPR) seguidos de um particípio passado do verbo (VBN) seguidos de uma preposição (PIN) ou um advérbio (RB) ou uma forma do BE. WZPRES: Present participial WHIZ deletion relatives* (Ex.: the event causing this decline is….) Esta etiqueta é aplicada quando um particípio do presente (VBG) é precedido por um substantivo (NN). XX0: Negação Analítica Esta etiqueta foi aplicada à palavra not e ao item n´t_RB. References Biber, D. (1988). Variation across Speech and Writing. Cambridge: Cambridge University Press. Biber, D. (1989). A typology of English texts. Linguistics, 27(1), 3–43. Stanford Etiquetador v. 3.1.5. Retrieved from: http://nlp.stanford.edu/software/etiquetador.shtml. Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A Comprehensive Grammar of the English Language. London: Longman.

Graduate diploma - Estacio de Sa University

Years of experience: 8. Registered at ProZ.com: Oct 2016.

N/A

Lilt, Wordfast

English (PDF)

Bio

I am a Portuguese (Brazil) native speaker. I have worked as a translator for 1 year. English Teacher for 23 years. I am a Bachelor in Letters – English Language and Literature, certified by Federal University of Minas Gerais which is one of the most renowned institutions in Brazil. I am certified by Estacio de Sa University, an outstanding Brazilian institution in Translation Studies (English-Portuguese), Postgraduate Degree.

References

Ms. Fabiane Pacífico
Government agent/Former student and client/email: [email protected]

"A. Ladislau is not only a open minded but also an inspiring Translator and Teacher. Working with A. Ladislau I found her as a person with great background and deep proficiency of modern solutions. She is precise, smart and broad-minded person. Creative, insightful, efficient and loyal colleague."
Mr. Leonardo David email: [email protected]
Business consultant/Former student and client

Profile last updated
Jan 12, 2021

More translators and interpreters: English to Portuguese - Portuguese to English More language pairs

Your current localization setting

Select a language

You have native languages that can be verified

Your current localization setting

Select a language