LARNet; The Cyber Journal of Applied Leisure and Recreation Research

Student perceptions of teacher evaluations in a recreation curriculum: The role of student gender

H. Joey Gray, Ph.D., Middle Tennessee State University

Sarah J. Young, Re.D., Indiana University

(Jan 2009)

Primary Contact:

H. Joey Gray
Middle Tennessee State University

Department of Health & Human Performance

PO Box 96

Murfreesboro, TN 37132

Office: (615) 904-8359

Email: hjgray@mtsu.edu

Abstract

The present investigation focuses upon the role student gender plays in influencing student perspectives of teacher evaluations in a recreation curriculum. A variety of variables influencing teacher evaluation outcomes at the collegiate level have been discussed in the literature, yet how students perceive these evaluations and how that perception might influence their responses has been scantly explored. The findings of the study revealed gender can be indicative of student perceptions and responses on teacher evaluations. The specific categories illustrating significant differences by gender included (a) knowledge of how evaluation outcomes is used, (b) seriousness with which students take evaluations, and (c) accuracy of student responses. The study indicated that female students may be more likely than males to be knowledgeable, take completing evaluations seriously, and provide accurate responses. This information may assist both instructors and administrators in order to more accurately explain and interpret student responses of teacher evaluations for specific courses. Future recommendations and implications for instructors and administrators are discussed.

Key Words: Student evaluations of teachers, gender, recreation, curriculum

Introduction

Student evaluations of their college instructors and professors have long been the focus of research by those who seek to discover the accuracy of these evaluative measures. As a result various factors including instructor personality (Cardy & Dobbins, 1986; Chonko, Tanner, & David, 2002; Clayson, 1999; Williams & Ceci, 1997), gender of instructor (Centra & Gaubatz, 2000; Dukes & Victoria, 1989), grade inflation (Isely & Singh, 2005), and job security (Barndt, 2001) have been explored in terms of how they influence the outcomes of student evaluations of teachers (SETs). Yet, a perspective often overlooked in this area of research is the student perspective (Spencer & Schmelkin 2002). Specifically, how may student perceptions of SETs influence their responses on the evaluative measure? Furthermore, the literature revealed no such perceptions of students enrolled in a recreation curriculum.

Do college students take their time when completing end-of-course evaluations and truly reflect upon the performance of their instructors? Simpson and Siguaw (2000) questioned students’ ability to evaluate their instructors properly and accurately. Conflicting evidence regarding the validity and reliability of students’ evaluations of their teachers has been debated for decades, yet Centra (1993) spoke to the reality of the situation. He stated, “despite discrepancies in opinions and research findings on the validity of student evaluations, it is essential for faculty to understand that [SETs] are and probably will continue to be the primary institutional measure of their teaching effectiveness” (p. 30). In light of this reality, it behooves faculty to gain a greater understanding of student perspectives regarding SETs.

College administrators and professors alike question the accuracy of SETs, requiring a clearer understanding of how students perceive SETs and how they approach these measures. More specifically, demographic variables (e.g. student gender, age, class rank, and major) may be indicative of student perceptions and potentially influence student responses on teacher evaluations. Gender, in particular, is a common demographic variable explored in various areas of research. Muñoz-Silva, Sánchez-García, Nunes, Martins (2007) maintain the effects of gender differences on perceptions and subsequent behaviors is well established. In their study, the authors utilized the Theory of Planned Behavior and the Theory of Reasoned Action models to explore how gender may influence behavior. While existing SET research explores the effect instructor gender may have on student perceptions (Centra & Gaubatz, 2000; Chamberlin & Hickey, 2001; Goldberg & Callahan, 1991; Moore, 1997; Martin, 1984); little research has been conducted on how student gender might influence the manner in which they complete evaluations of college instructors.

Due to limited research on student perceptions and the absence of research in the academic area of recreation we conducted a study to explore student perspectives regarding the uses of and rationale for SETs in a recreation curriculum. Utilizing the Theory of Planned Behavior (TpB) we examined student perceptions of SETs and how these perceptions may influence their responses (behavior) on SETs. For example, did the student perceive they had intentionally provided inaccurate responses (positive or negative) on SETs? During this investigation, relationships among student perspectives and demographic variables (gender, class rank, recreation major/minor, and age) were explored and tested for significance. The study hypothesized that these independent variables (gender, class rank, recreation major/minor, and age) would not be a factor influencing student perception of SETs. Based on the study results, gender differences yielded the most significant findings. Thus, the present investigation discusses the findings relating specifically to gender differences on student perceptions of SETs in a recreation curriculum.

Review of Literature

Beginning in the 1920s, various forms of Student Evaluation of Teachers (SETs) were utilized in the United States (Doyle, 1983). SETs are evaluative measures by the student of teacher performance and, some scholars argue, teacher effectiveness (Chonko, Tanner & David, 2002; Clayson, 1999; Ellett & Teddlie, 2003; Marsh, & Roche, 1997; Sidanius & Crane, 1989). Hobson and Talbot (2001) stated the overarching goal of SETs is not only to “to target specific dimensions or behaviors” (p. 27) but also to evaluate “the overall effectiveness of the teacher or the quality of the course” (p. 27). Hobson and Talbot continued to define SETs by establishing they are used for both formative and summative evaluation. Formative evaluation refers to the use of SET data by course instructors to improve teaching skills and course content while summative evaluation refers to the use of SET data by administrators to assess teaching performance.

During the mid-1960s, research on the teacher evaluation process itself became relevant to those in academia in large part due to self-preservation. The use of SET data had begun to play a major role in the advancement of collegiate instructors regarding tenure, promotion, pay raises and job security (Centra, 1993). Given the significance SET data could play in the livelihood of those in academia, the usefulness and accuracy of these data sets is of natural interest for academic researchers. Over the past 25 years, the focus of the research has been on the validity of SETs, reaching a peak around the 1980s (Greenwald, 1997). Two prevailing questions were largely the focus: Are students qualified to evaluate their teachers and if so, are they mindful when they evaluate them? Since SET data are often used in the advancement of an instructor’s career and the data is a direct interpretation of the student opinion, it is important to know the answer to these questions. Some researchers strongly question the ability of students to provide unbiased responses on SETs (Bodle, 1994; Chonko et al., 2002, Simpson & Siguaw, 2000); whereas, other research supports the validity of SET data (d'Apollonia & Abrami, 1997; Hobson & Talbot, 2001; Odden, 2004; Overall & Marsh, 1980). While the arguments both for and against the validity of SETs are prevalent in the literature, the constant rhetoric continues without a clear victor. Notably, the majority of universities continue to utilize the student data in making administrative decisions about faculty’s promise and performance (Isely & Singh, 2005; Read & Raghunandan, 2001). The value of SET data and how effectively these SET ratings are used, known as consequential validity, is the focus of McKeachie’s (1997) work. He contended that it is not the validity of SETs that should be the focus of concern, but the consistency among administrators in how the data is interpreted.

Researchers have implemented a variety of methodological approaches to unravel the factors that influence SET responses. However, SET research continues to face a great deal of scrutiny regarding the methodology, evidence, and conclusions generated by scholars because the results have varied (Chamberlin & Hickey, 2001; Feldman, 1997; Marsh, 1987; Marsh & Roche, 1997). Due to the vast discrepancy in research findings, the value of utilizing SET data in tenure and promotion decisions has fostered even more concern by those in academia (Cashin & Downey, 1992). Ultimately, the goal of the bulk of SET research has been to clarify the validity of these instruments. The majority of interest in SET data appears to stem from the magnitude this data has played in administrative decisions regarding tenure and promotion, pay raises, and security of academic appointments (Colbeck, 2002). Seldin (1980) cautioned against the use of SETs in academic decisions by stating, “We wish to emphasize that student ratings of undergraduate teaching fall far short of complete assessment of an instructor’s teaching contribution” (p. 65). If indeed university administrators are going to continue to utilize SET data, data solely derived by student opinion, which may have serious impact on an instructor’s career, it is vital to understand the student perspective of SETs and how student demographic variables may be indicative of SET responses.

Gender and SETs

Numerous researchers have indicated gender can play a role in SET responses (Centra & Gaubatz, 2000; Dukes & Victoria, 1989; Feldman, 1993; Moore, 1997; Martin, 1984); however, the focal point of these studies was on the influence of instructor gender rather than student gender. Nonetheless, the results were significant as they indicated gender could be an influential factor in SET responses. Specifically, female faculty are expected by students to be friendly and warm (Martin, 1984) and encourage questions (Feldman, 1993). Moore’s (1997) study revealed male faculty have been described by students as more scientific and knowledgeable; whereas, Centra and Gaubatz (2000) found female students in particular perceived female instructors to be more organized and possessed stronger communication skills. As with most SET research, conflicting evidence does exist regarding gender and its influence on SET responses as both Dukes and Victoria (1989) and Feldman found no differences regarding instructor gender and SET scores. However, Centra and Gaubatz noted these inconsistencies are probably due to design flaws. Feldman supported this notion as his review of 10 SET and gender studies noted a lack of control for course and discipline among the studies.

In an effort to explore the relationship between gender and SET responses, Chamberlin and Hickey (2001) surveyed 198 students (84 males, 114 females) at the end of the academic semester, using a questionnaire which consisted of items inquiring about the instructor’s highest degree received, rank and tenure status and instructor’s teaching style. The researchers found that “…female students rated male and female faculty as significantly different while male students did not differentiate between male and female faculty on [the]same items” (Chamberlin & Hickey, p. 12). Specifically, female students found female instructors to be more assertive when presenting material, sympathetic in providing feedback, sensitive to student needs, helpful with student problems, encourage classroom discussions, and less likely to make students feel inferior than male instructors (Chamberlin & Hickey). This data is somewhat disconcerting to both male and female faculty. If women are expected to have better rapport, then students may expect more from a female than a male instructor; effectively, holding a female teacher to a higher standard of rapport and vice versa for a male instructor.

While the findings from the aforementioned research are informative and signify gender may indeed play a role in SET data, student perceptions of SETs and how these perceptions may affect the responses students provide is a crucial piece of the puzzle that is still missing. Specifically, do male and female students perceive SETs differently? Additionally, recreation programs within the United States can be heavily skewed with regard to the gender of the students enrolled in their programs. For example, therapeutic recreation programs are well known to be enrolled with a much larger female student population. Specifically, a national therapeutic recreation curriculum study conducted by Stumbo, Carter, and Kim (2004) reported that on average 86.3% of therapeutic recreation students were female. Obviously, faculty are aware other programs may hold higher male student enrollments as well.

Supporting the notion that gender does influence SETs, both a comprehensive literature review by Morgan and Ogden (1981) and an empirical study by Darby (2006) maintain student gender should be considered when reviewing SET data. Building upon the review of Morgan and Ogden, Darby found that females and males responded differently on evaluations. Darby’s study examined 504 male and female students to uncover if gender differences existed in students’ evaluative abilities. In the first part of the study, Darby tested the subjects in areas linked to evaluation abilities such as notability, recall and relation to others. Males scored lower than females in all three areas. In part two of her study, Darby accounted for variables often thought to be linked to gender and SETs: she controlled for such variables as instructor gender, class size, course content, teaching skill, instructor attractiveness and evaluative method. Likert scale evaluations showed no gender differences in response to instructors, whereas, females tended to be more positive than males when provided open-ended evaluations. When both scale types were combined, females were generally more favorable in their responses than were their male counterparts. Darby’s work underscores the fact that gender differences do exist in how males and females respond to their instructors but that this pattern is not completely straightforward. Based on the information gathered in this review and the findings of our larger study on student perceptions of SETs, it is apparent that one must have both a clear understanding of the student perspective of SETs and the role gender may play. Thus, the focus of the present investigation is to explore the influence gender may have on student perceptions of SETs.

Methodology

Instrumentation

The survey instrument, titled Student Perceptions of Student Evaluations of Teachers questionnaire (SPSET), was designed for the current study based upon the work of Smith and Carney (1990) and incorporated the Theory of Planned Behavior (Ajzen & Fishbein, 2005). Smith and Carney surveyed students’ perceptions of teacher evaluations in introductory psychology and education courses using a 31-item questionnaire. The SPSET questionnaire contained 52 close-ended questions including continuous and categorical measurement levels in the following five categories: (a) demographics, (b) knowledge of purposes (uses) of SETs, (c) seriousness with which students respond to SETs, (d) perceived value of SET feedback, and (e) accuracy of SET responses. The Theory of Planned Behavior (TpB) assists in the explanation of the importance of understanding student perceptions of SETs. The purpose of TpB is to predict and understand attitudes and behavior. According to the theory, behavioral intent is the vital determinant of a person's actions. A combination of subjective norms and one’s attitude toward performing the behavior makes up an individual’s intention to perform a behavior. Behavioral and normative belief, subjective norms, evaluations of behavioral outcome, and motivation to comply generate one’s attitude toward the behavior (Aizen and Fishbein). Thus, the TpB was used as the basis of the structure for the SPSET questionnaire to examine how student perceptions of SETs are influenced by their attitudes, subjective norms, perceived behavioral control, and intentions; which in turn, will ultimately influence their behavior (actual responses) on SETs.

As recommended by Aizen (2002), the questionnaire consisted of items delivered in a randomized order to avoid systematic responses by subjects. In order to test the accuracy of responses it was necessary to ask selected questions in multiple ways obtaining a reliable self-report measure; thus, several questions targeting the same information utilizing various positive and negative anchors were included. These positive and negative anchors, also known as semantic differentials, were presented in an opposite manner for each question that was repeated (e.g., definitely true – definitely false; extremely unlikely – extremely likely). Moreover, these anchors utilized verbiage in an active voice with a degree of novelty to reduce mindless, repetitive responses. To measure belief strength and outcome evaluation, responses utilized a unipolar measure seven-point optimal scale. While a five-point scale could have been used, a seven-point scale assisted with identifying variability and differences in responses. Increased answer choices allowed for a more clear examination of true differences rather than a limited or lumped choice format (Aizen). Therefore, both verbiage and visual differentiation were implemented to assist with reliability.

Pre-Testing the Instrument

The validity and reliability of the SPSET questionnaire were tested using both a panel of experts and a pilot study. The panel of experts consisted of three individuals: the Executive Association Dean who oversees the pedagogical aspects of the School of HPER, the Undergraduate Coordinator who oversees the curriculum of the recreation department, and the Director of the Center for Evaluation. The panel reviewed the questionnaire for content, reliability, and clarity of word choice. Based upon the feedback provided by the panel of experts, four questions were deleted and four questions were added with minor adjustments being made for clarity of word choice and the SPSET consisted of 52 items.

The questionnaire was then pilot tested using a sample of 58 undergraduate students in a health class consisting of recreation, health, and kinesiology majors at a Carnegie Extensive Research institution. Subjects of the pilot test were asked to identify confusing or inappropriate items, ask questions for clarification, and provide feedback regarding clarity of the items on the questionnaire. Reliability analysis for internal consistency resulted in a Cronbach alpha coefficient of .810 for the overall instrument indicating the questionnaire was consistent in measuring student perceptions of teacher evaluations. Cronbach’s alpha coefficient was also obtained for each of the outcome categories with results of: knowledge a=.673, seriousness a=.866 and accuracy a=.686. Initially, the reliability analysis of items in the value of feedback category resulted in a Cronbach’s alpha coefficient of .476. Upon removal of a question regarding the seriousness level with which instructors review SET data from this category, the Cronbach’s alpha coefficient increased to .663. Because of the dramatic change in coefficients affiliated with this specific item, it was determined that subjects may have perceived the item differently than what was originally intended by the researchers. Consequently, the item was removed from that section. By achieving an acceptable level of the Cronbach’s alpha coefficient (α = .810) with an appropriate sample size, the results of the pilot test indicated that a few adjustments to the SPSET were necessary. As a result, minor editing took place for clarity of word choice on a few of the items. Once those changes were made, the SPSET remained at a length of 52 items and was ready for distribution.

Data collection

The questionnaire was designed to capture the perspectives of students enrolled in recreation courses at a Carnegie Extensive research university located in the Midwestern region of the United States. Letters were distributed to all course instructors during the 2006 spring semester requesting about 15 minutes of class time to administer the survey. Undergraduate students enrolled in 2- and 3-credit hour recreation courses offered during the spring semester were targeted. Thirty-four out of 36 instructors gave their consent to have the survey distributed. As a result, 34 courses involving approximately 1,682 students were eligible to participate in the study. Because the probability was high that students were enrolled in more than one recreation course during data collection, students were asked to complete the questionnaire one time only. Thus, 523 subjects completed the questionnaire.

Statistical analysis

Descriptive statistics (frequencies and central tendency) were conducted for each item in the survey and used to organize and characterize the data. Chi-square Tests of Independence were conducted for each Likert scale statement in the questionnaire and analyzed to identify relationships between the predictor variables, demographic variables (i.e., gender, recreation major/non-major, age, and class status), and the criterion variable. The information gathered via Chi-square Tests of Independence indicated whether frequencies of subject responses given were greater than could be expected by chance alone. SPSET items found to be significant (p < .01) indicated the variables were related to one another. However, the Chi-square value alone merely establishes whether a significant relationship exists among variables and does not indicate the strength of that relationship nor the specific relationship among variables. Thus, once items were found to be significant, Cramér’s V statistic was used to ascertain the strength of the relationship between variables (Pett, 1997). Additionally, standardized residuals (+/- 2.33 or greater) were examined as a post-hoc analysis to determine where the significant differences emerged within the Chi-square values as recommended by Pett.

Because multiple test of significance were used in the analysis, adjustments were made for replication and accumulation of Type I error. Using the standard Bonferroni correction, this would require the alpha level be set at .002. Yet, setting the alpha at too high of a level leads to a greater likelihood of a Type II error. Lipsey (1990) argues there is justification for relaxing error risk to accommodate limits of effect size. In the current analysis, a Bonferroni correction adjustment with 20 degrees of freedom (df =20) would increase the likelihood of a Type II error. As a result, the decision was made to adjust the level of significance to .01 to reduce the possibility of a type I error, while at the same time not overcompensating to allow for a greater possibility of a Type II error.

Results

The study consisted of 523 subjects between the ages of 18 to 54 years within the following academic ranks: freshmen (16.8%, n=88), sophomores (25.6%, n=134), juniors (29.8%, n=156), and seniors (27.7%, n=145). Freshmen represented the smallest percentage of undergraduates because the recreation major is often a ‘discovery’ major, suggesting that students do not enroll in the major until their sophomore or junior years on campus. Women represented 56.2% of the sample while men represented 43.8%. Additionally, subjects were categorized as recreation majors/minors (75.2%), and non-recreation major/minors (24.8%) as not all students enrolled in recreation courses are a recreation major or minor. All demographic data is summarized in Table 1.

Table 1. Undergraduate subject demographics

Demographic	f	%
Undergraduate	523	100
Freshman	88	16.8
Sophomore	134	25.6
Junior	156	29.8
Senior	145	27.7

Male	220	42.1
Female	303	57.9

Recreation Major/Minor	367	70.2
Non-Recreation Major/Minor	156	29.8

Age
18-19	127	24.3
20	147	28.1
21	127	24.3
22-54	122	23.3
Note. N=523

Of all the demographic data collected in this study, gender appeared to have the most significant impact on student perceptions of SETs. The data was organized and analyzed categorically as it was hypothesized there would be no relationship between gender and subject knowledge of the uses and purposes of SETs, the seriousness with which students completed SETs, subject perception of the value of their feedback on SETs, and subject perceptions of accuracy of their SET responses. What follows is a presentation of the findings in terms of knowledge, seriousness (attitude), value, and accuracy as each relate to gender.

Knowledge and gender

Chi-square Tests of Independence indicated gender was predictive of responses more than can be expected by chance alone for nine SPSET items in the knowledge of purposes and uses category (See Table 2).

Table 2 (Continued). SPSET knowledge by gender

	Gender
	Men		Women
Question Response	n	%	n	%	x²

Question 14: Knew impact job performance impact response					9.92**
False	17	7.7	17	5.6
Sometimes	95	43.2	95	31.5
True	108	49.1	190	62.9
Total	220	100	302	100

Question 23a: Used to give instructors pay raises					9.46**
False	103	46.8	146	48.3
Sometimes	90	40.9	141	46.7
True	27	12.3	15	5
Total	220	100	302	100

Question 23d: Used to make improvements in courses					18.07***
False	28	12.7	15	5
Sometimes	95	43.2	104	34.6
True	97	44.1	182	60.5
Total	220	100	301	100

Question 23e: Used to make improvements in teaching style					21.74***
Unlikely	39	17.7	21	7
Sometimes	111	50.5	136	45
Likely	70	31.8	145	48
Total	220	100	302	100

Note. * p< .05, ** p< .01, *** p< .001.

Chi-square results indicated a significant but weak association between gender and knowledge questions presented in Table 2. Examination of the standard residuals revealed three items contained one or more cells at +/- 2.33 or greater. Although the majority of men and women indicated SETs were either not used (44.3%) or only used ‘sometimes’ (47.7%) in merit decisions; of those who did indicate SETs were used for faculty merit decisions, men were significantly more likely to do so than women (c²= 9.466, p =.009, Cramér’s V = .13). Only a little more than half of respondents (53.6%) perceived that SET data was used by instructors to make course improvements. Of those subjects who indicated instructors did not use SETs to make course improvements, men were significantly more likely to do so than women (c²= 18.077, p =.000, Cramér’s V = .18). Similar results were found for a question regarding teaching styles. Men indicated their SET responses were more likely to be influenced by an instructor’s teaching style than women were (c²= 21.749, p =.000, Cramér’s V = .20).

The following items did not contain cells with standard residuals greater than +/- 2.33; however, these items were found to be significant and their apparent relationships are presented. When subjects were asked whether instructors use SETs to make course improvements, men and women were similar when reporting ‘somewhat’ (57.4%). However, men had a higher percentage of reporting ‘false’ and women had a higher percentage of reporting ‘true’ more than expected. In other words, although this item suggests that both genders held the perception that SETs are used ‘somewhat’ to make course improvements, the statistical data revealed females were significantly more likely to believe SETs were used to make course improvements than males. In a related item, most men and women indicated instructors do read SET data, yet more female subjects than expected stated yes to this question. This could be interpreted to mean women are more likely to believe that instructors do care enough about students’ comments to read them. However, women were more likely than expected to indicate they did not know if university officials read SET data. Three SPSET items addressed how increased knowledge about how SET data is used may influence responses. The data revealed women were more likely to agree that increased knowledge would influence their SET responses on all three questions.

Seriousness and gender

Chi-square Tests of Independence indicated gender was predictive of responses more than can be expected by chance alone for three SPSET items that dealt with the seriousness with which students completed SETs (See Table 3).

Table 3. Seriousness by gender

	Gender
	Men		Women
Question Response	n	%	n	%	x²
Question 11: Take time completing SETs					9.19**
Few to None	42	19.1	31	10.2
Sometimes	125	56.8	202	66.7
Most of the time	53	24.1	70	23.1
Total	220	100	303	100

Question 28: If classmates leave, I will too					19.93***
True	45	20.5	27	8.9
Sometimes	86	39.1	104	34.3
False	89	40.5	172	56.8
Total	220	100	303	100

Question 36: I take completing SETs seriously					16.65***
True	47	21.4	112	37
Sometimes	123	55.9	148	48.8
False	50	22.7	43	14.2
Total	220	100	303	100

Note. *p< .05, ** p< .01, *** p< .001.

Chi-square results indicated a significant but weak association between gender and SPSET seriousness questions. Examination of the standard residuals revealed three items contained one or more cells at +/- 2.33 or greater. When asked about taking their time in completing SETs, 62.5% of the subjects indicated that only ‘about half the time’ do they take their time completing SETs. Additionally, men were more likely to indicate they did not take their time when completing SETs than women (c²= 9.198, p =.010, Cramér’s V = .13). Furthermore, the majority of subjects indicated they would not leave the room if a classmate left the room before completing SETs (49.9%); however, of those who indicated they would leave the room men were more likely to do so than women (c²= 19.930, p =.000, Cramér’s V = .19). Finally, of those who indicated they took completing SETs seriously, men were less likely than women to do so, although the majority of subjects (51.8%) indicated they took SETs seriously some of the time (c²= 16.653, p =.000, Cramér’s V = .17).

Value of feedback and gender

Chi-square Tests of Independence indicated no significant relationship between gender and items in the SPSET value of feedback category. Thus, gender was not predictive of responses regarding subject’s perceptions of how their feedback on SETs was perceived by instructors and administrators.

Accuracy and gender

Chi-square Tests of Independence indicated gender was predictive of responses more than can be expected by chance alone for the items presented in Table 4.

Table 4. Accuracy by gender

	Gender
	Men		Women
Question Response	n	%	n	%	x²
Question 10: Intend to be fair and accurate				16.20***
False	26	11.8	20	6.6
Sometimes	98	44.5	98	32.3
True	96	43.6	185	61.1
Total	220	100	303	100

Question 16: My SET responses reflect exactly how I feel					7.90**
Disagree	27	12.3	17	5.6
Sometimes	108	49.1	150	49.5
Agree	85	38.6	136	44.9
Total	220	100	303	100

Question 33: Time of day can influence					8.09**
Disagree	95	43.2	169	55.8
Sometimes	104	47.3	111	36.6
Agree	21	9.5	23	7.6
Total	220	100	303	100
Question 38: Tend to give females higher marks				26.75***
Disagree	114	52.1	222	73.3
Sometimes	96	43.8	78	25.7
Agree	9	4.1	3	1
Total	219	100	303	100
Note. p< .05, * p< .01, *** p< .001.

Chi-square results indicated a significant but weak association between gender and SPSET accuracy items. Examination of the standard residuals revealed three questions contained one or more cells at +/- 2.33 or greater. The data revealed men were less likely to indicate they intended to record SET responses that were fair and accurate (c²= 16.207, p =.000, Cramér’s V = .17). Forty-nine percent of subjects (49.3%) reported that their SET responses were exact indications of their opinions only ‘sometimes’ and 42.3% indicated ‘agree’. Of those who stated ‘disagree,’ men were less likely to do so than women (c²= 7.906, p =.019, Cramér’s V = .12). Results also depicted women were more likely to disagree they would issue high SET marks to female instructors (c²= 26.752, p =.000, Cramér’s V = .22) than men. SPSET item 33 did not contain cells with standard residuals greater than +/- 2.33. However, this item was found to be significant and the apparent relationships are presented. Women were less likely than men to indicate the time of day a course was offered could influence SET responses, as the majority of men (47.3%) indicated that time of day a course was offered (e.g., 8:00 a.m. class) could ‘sometimes’ influence their responses on SETs.

Discussion

The findings of the present investigation revealed that gender might be indicative of how students perceive SETs; thus, supporting the results of Darby’s (2006) research on gender differences in evaluation abilities. Based upon this finding, knowing the gender of students could help faculty to better explain and interpret SET response patterns for their courses. Faculty who gain greater insight of student perceptions and gender differences associated with these perceptions would likely have a better understanding of less-than-desirable SET responses (i.e., when instructors perceive and/or experience good rapport and classroom relationships with their students, yet their SET scores do not support that perception). However, pause is warranted when considering whether to collect demographic data (e.g. gender, age) as student identity could quickly be determined in courses with low enrollments. Could the benefits of collecting demographic data outweigh the costs? Perhaps, enrollment limitations could be predetermined on the collection of demographic data to avoid divulging student identity. The remainder of the discussion focuses upon the categories of knowledge of purpose, seriousness, and accuracy of perceptions by gender as significant differences were found in each of these areas.

In terms of students’ knowledge of how SET data is used, overall the students in this study knew very little about whom, beyond their instructor, would see their feedback. This finding is particularly perspicacious as instructors, especially those who teach upper level courses to juniors and seniors, may assume that students already know how SET data might be used to evaluate them in the classroom. Marlin (1987) found a similar result in a study of student perceptions of SETs when he concluded that students perceive their input on SETs have little effect upon the careers of faculty. Instructors of college courses could help themselves by sharing with their students who (e.g. department chairs, deans, etc.) reads the SET data and what happens to the information after it is read. This could help boost students’ knowledge of the purpose of SETs, which might in turn influence students’ seriousness and accuracy in completing the evaluations.

Another dimension of knowledge of the purpose of SETs was whether students perceived that their instructor read the evaluations. While women were more likely to believe their instructors actually used the feedback to improve their courses, overall the subjects in this study seemed skeptical that instructors made improvements from the student input. Again, Marlin’s (1987) study revealed a similar result when he stated students believed “nobody pays much attention nor does much as a result of the outcome of the evaluation process” (p. 714). The implication of this finding is that recreation faculty can help their students to understand the importance of SETs and how the results of these evaluations are used by sharing this information in each class.

Overall, the subjects in this study were not very serious about SETs, especially when compared to the seriousness with which the data is reviewed by administrators and peers in making faculty promotion and tenure decisions. The data revealed that only ‘some of the time’ or ‘about half the time’ do students take a serious attitude in completing SETs. Specifically, women were more likely than men to express that they attempted to complete SETs in a serious manner. Does this mean that administrators and instructors should concentrate more on the SET data provided by women? Obviously not, but this finding does imply that educators should consider obtaining more demographic data on SETs as this information could provide scores in a more accurate context of the individual marks received. Perhaps an instructor receives SET scores far beyond the scores of a peer teaching the same course in a different section. While demographic data cannot explain truthfulness or the ultimate cause behind the erratic scores, this information may lend some assistance if 86% of the students enrolled were women as opposed to other courses taught by the same instructor where the gender enrollment was 35% women and 65% men. While this example is somewhat amplified, gender is one of many demographics that could be collected and monitored on SETs. If various demographic variables were collected and monitored more closely, particularly in recreation curricula, a substantially better understanding of the student perspective of SETs and how these perspectives may influence their SET responses might be gained. Including demographic variables as a part of the SET instrument could ultimately lead to more precise interpretation of the data and how it is used for formative and summative evaluative purposes.

Do students provide accurate information on SETs? The findings of this study indicated the majority of students perceive they provided accurate and honest responses most of the time. However, some researchers argue that student expectations can influence SET responses (Andersen & Miller, 1997) which may contribute to grade inflation (Singleton, 1978; Sonner, 2000); ultimately yielding inaccurate SET data. A scenario such as receiving a lower test or assignment grade than expected might influence the accuracy of student responses, causing students intentionally to report lower scores of their instructor. Isely and Singh (2005) discovered a similar result when they found higher-than-expected grades did influence SETs. Chonko et al. (2002) concurred by stating, “The grade received in the class is likely to play a disproportionate role in the students’ evaluation of the course” (p. 279). The present investigation could not support these findings.

The evidence presented in this study would suggest in some cases students provide less than accurate SETs. In particular, men did indicate they were more likely to be influenced by an instructor’s teaching style than women. Teaching style is akin to one’s personality and these traits or skills vary among instructors. Langbein (1994) noted that faculty personality traits appear to have the largest impact on SET responses. Langbein’s work was supported by several researchers who agreed instructor personality influenced student SET responses (Ambady & Rosenthal, 1993; Cardy & Dobbins, 1986; Chonko et al. 2002; Clayson, 1999; Marsh & Roche, 1997; Williams & Ceci, 1997). As such teaching style should not be a sole predictor of an effective teacher. Thus, to issue lower SET scores based solely on one’s preference of teaching style is not necessarily indicative of poor teaching performance and is seemingly unfair. Indeed, some students may find certain teaching styles more favorable than others and some teaching styles may foster more enriched learning experiences for some students. As such, a student’s preference of teaching style is not completely objective. Pointedly, students may not like an instructor because his teaching style is ‘too challenging’ for their liking; however, this challenge can yield a greater learning experience. Receiving a lower rating because of this type of teaching style is indeed a biased result. Again, the implication of this finding is the necessity of the instructor acknowledging the importance of objective, sincere student feedback and honestly sharing with students the importance of their comments and ratings prior to the distribution of the evaluation form.

Based on the results of the present investigation, future research should elaborate on student gender differences in SET data and the impacts of said differences on SET scores should be examined further to reveal any impact this may have on the accuracy of SETs. Additionally, other demographic variables such as cultural background and ethnicity might have an influence upon how the student perceives the evaluation of their instructors. Future research should include an investigation of these variables. Further, a qualitative case study may be helpful to examine student perspectives of SETs in the recreation academic area to include a comparison of the qualitative assessment and the findings of the present investigation and assessment behind why students held distinctive and separate views regarding their actions from peers.

Conclusion

The data from the present investigation yielded interesting results contributing to the existing SET literature. The student perspective along with the role their demographics may play in SET responses has also been neglected among researchers. The results of the present investigation provided compelling data indicating a relationship between student gender and student perceptions may exist; and furthermore; suggested this relationship could influence student SET responses. Can we reach a better understanding about the accuracy of SET data and what, if anything, this data divulges about instructor performance? Based on the evidence presented in this study a better understanding of student perspectives and gender patterns that may exist among SET responses indeed sheds light on the complicated puzzle of student evaluations of their college professors.

References

Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, 64(3), 431-441.

Andersen, K., & Miller, E.D. (1997). Gender and student evaluations of teaching. Political Science & Politic, 30, 216-219.

Aizen, I. (2002). Constructing a TpB questionnaire: Conceptual and methodological considerations. Boston: UMASS Retrieved September 22 2007, from http://www.people.umass.edu/aizen/pdf/tpb.measurement.pdf.

Ajzen, I., & Fishbein, M. (2005). The influence of attitudes on behavior. In D. Albarracín, B. T. Johnson, & M. P. Zanna (Eds.), The handbook of attitudes (pp. 173-221). Mahwah, NJ: Erlbaum.

Barndt, R. J. (2001). Fiscal policy effects on grade inflation. Received April 23, 2006 from www.newfoundations.com/policy/barndt.html.

Bodle, J. V. (1994). Evaluating student evaluations: The search for perspective. Journalism Education, 49, 76-81.

Cardy, R. L., & Dobbins, G. H. (1986). Affected appraisal accuracy: Liking as an integral dimension of evaluating performance. Journal of Applied Psychology, 71(4), 672-678.

Cashin, W. E., & Downey, R. G. (1992). Using global student rating items for summative evaluation. Journal of Educational Psychology, 84, 563-72.

Centra, J. A. (1993). Reflective faculty evaluation: Enhancing teaching and determining faculty effectiveness. San Francisco: Jossey-Bass.

Centra J. A., & Gaubatz, N. B. (2000). Is there gender bias in student evaluations of teaching? Journal of Higher Education, 71(1), 17-33.

Chamberlin, M. S., & Hickey, J. S. (2001). Student evaluations of faculty performance: The role of gender expectations in differential evaluations. Educational Research Quarterly, 25(2), 3-14.

Chonko, L. B., Tanner, J. F., & David, R. (2002). What are they thinking? Students' expectations and self-assessments. Journal of Education for Business, 77(5), 271-279.

Colbeck, C. L. (2002). Integration: Evaluation faculty work as a whole. New Directions for Institutional Research, 114, 43-52.

Clayson, D. E. (1999). Students' evaluation of teaching effectiveness: Some implications of stability. Journal of Marketing Education, 21(1), 68-75.

d'Apollonia, S. & Abrami, P. C. (1997). Navigating student ratings of instruction. American Psychologist, 52(11), 1198-1208.

Darby, J. A. (2006). Evaluating courses: An examination of the impact of student gender. Educational Studies, 32(2), 187-199.

Doyle, K. O. (1983). Evaluating teaching. Lexington, MA: Lexington Books.

Dukes, R. L., & Victoria, G. (1989). The effects of gender status, and effective teaching on the evaluation of college instruction. Teaching Sociology, 17, 447-457.

Ellett, C. D., & Teddlie, C. (2003). Teacher evaluation, teacher effectiveness and school effectiveness: Perspectives from the USA. Journal of Personnel Evaluation in Education, 17(1), 101-108.

Feldman, K. A. (1993). College Students’ views of male and female college teachers: Part I – evidence from students’ evaluations of their classroom teachers. Research in Higher Education, 34(2), 151-211.

Feldman, K. A. (1997). Identifying exemplary teachers and teaching: Evidence from student ratings. In R. P. Perry & J. C. Smart, (Eds.), Effective teaching in higher education: Research and Practice: 369-395. New York: Agathon Press.

Goldberg, G., & Callahan, J. (1991). Objectivity of student evaluations of instructors. Journal of Education for Business, 66(6), 377-378.

Greenwald, A. G. (1997). Validity concerns and usefulness of student ratings of instruction. American Psychologist, 52(11), 1182-1186.

Hobson, S. M., & Talbot, D. M. (2001). Understanding student evaluations. College Teaching, 49(1), 26-31.

Isely, P., & Singh, H. (2005). Do higher grades lead to favorable student evaluations? Journal of Economic Education, 36(1), 29-42.

Langbein, L. L. (1994). The validity of student evaluations of teaching. PS, 27, 545-553.

Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage Publications.

Marlin, J. W. (1987). Student perceptions of end of course evaluations. Journal of Higher Education, 58(6), 704-716.

Martin, E. (1984). Power and Authority in the Classroom: Sexist stereotypes in teaching evaluations. Signs: Journal of Women in Culture and Society, 24, 128-133.

Marsh, H. W. (1987). Students’ evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Education Research, 11(3), 263-353.

Marsh, H. W., & Roche, L. A. (1997). Making students' evaluations of teaching effectiveness effective. American Psychologist, 52(11), 1187-1197.

McKeachie, W.J. (1997). The validity of use. American Psychologist, 52, 1218-1225.

Moore, M. (1997). Student resistance to course content: Reactions to gender of the messenger. Teaching Sociology, 25, 128-133.

Morgan, B. B. & Ogden, G. D. (1981). Non-instructional correlates of student ratings: A brief review. International Review of Applied Psychology, 30(3), 409-427.

Muñoz-Silva, A., Sánchez-García, M., Nunes, C., & Martins, A. (2007, October). Gender differences in condom use prediction with Theory of Reasoned Action and Planned Behavior: The role of self-efficacy and control. AIDS Care, 19(9), 1177-1181.

Odden, A. (2004). Lessons learned about standards-based teacher evaluation systems. Peabody Journal of Education, 79(4), 126-137.

Overall, J. U., & Marsh H. W. (1980). Students' evaluations of instruction: A longitudinal study of their stability. Journal of Educational Psychology, 72, 321-325.

Pett, M. A. (1997). Non parametric statistics for health care research: Statistics for small samples and unusual distributions. Thousand Oaks, CA: Sage Publications.

Read, W. J. & Raghunandan, D.V. (2001, March/April). The relationship between student evaluation of teaching and Faculty evaluations. Journal of Education for Business, 189-192.

Seldin, P. (1980). Successful faculty evaluation programs. Crugers, NY: Coventry Press.

Sidanius, J., & Crane, M. (1989). Job evaluation and gender: The case of university faculty. Journal of Applied Social Psychology, 19, 174-97.

Simpson, P. M. & Siguaw, J. A. (2000). Students’ evaluation of teaching: An exploratory study of faculty response. Journal of Marketing Education, 22, 199-213.

Singleton, R. (1978). Effects of grade inflation on satisfaction with final grade: A case of relative

deprivation. The Journal of Social Psychology, 105, 37-41.

Smith, M. C., & Carney, R. N. (1990, April). Students’ perceptions of the teaching evaluation process. Paper presented at the annual meeting of the American Educational Research Association, Boston, MA.

Sonner, B. (2000). A is for “adjunct”: Examining grade inflation in higher education. Journal of Education for Business, 5-7.

Spencer, K. J. & Schmelkin, L. P. (2002). Student perspectives on teaching and its evaluation. Assessment & Evaluation in Higher Education, 27(5), 397-409.

Stumbo, N. J., Carter, M. J., & Kim, J. (2004). 2003 National Therapeutic Recreation Curriculum Study Part B: University, Faculty, Student, and Placement Characteristics. Therapeutic Recreation Journal, 38(1), 53-7.

Williams, W. M., & Ceci, S. J. (1997). How’m I doing? Problems with students’ ratings of instructors and courses. Change, 29(5), 12-23.