LARNet; The
Cyber Journal of Applied Leisure and Recreation Research
Student perceptions of teacher evaluations in a recreation
curriculum: The role of student gender
H. Joey Gray,
Sarah J. Young, Re.D., Indiana University
(Jan 2009)
Primary Contact:
H.
Joey Gray
Department of Health & Human Performance
Office: (615) 904-8359
Email: hjgray@mtsu.edu
Abstract
The present investigation focuses upon the role student gender plays in influencing student perspectives of teacher evaluations in a recreation curriculum. A variety of variables influencing teacher evaluation outcomes at the collegiate level have been discussed in the literature, yet how students perceive these evaluations and how that perception might influence their responses has been scantly explored. The findings of the study revealed gender can be indicative of student perceptions and responses on teacher evaluations. The specific categories illustrating significant differences by gender included (a) knowledge of how evaluation outcomes is used, (b) seriousness with which students take evaluations, and (c) accuracy of student responses. The study indicated that female students may be more likely than males to be knowledgeable, take completing evaluations seriously, and provide accurate responses. This information may assist both instructors and administrators in order to more accurately explain and interpret student responses of teacher evaluations for specific courses. Future recommendations and implications for instructors and administrators are discussed.
Key Words: Student evaluations of teachers, gender, recreation, curriculum
Student evaluations of their college instructors and professors have long been the focus of research by those who seek to discover the accuracy of these evaluative measures. As a result various factors including instructor personality (Cardy & Dobbins, 1986; Chonko, Tanner, & David, 2002; Clayson, 1999; Williams & Ceci, 1997), gender of instructor (Centra & Gaubatz, 2000; Dukes & Victoria, 1989), grade inflation (Isely & Singh, 2005), and job security (Barndt, 2001) have been explored in terms of how they influence the outcomes of student evaluations of teachers (SETs). Yet, a perspective often overlooked in this area of research is the student perspective (Spencer & Schmelkin 2002). Specifically, how may student perceptions of SETs influence their responses on the evaluative measure? Furthermore, the literature revealed no such perceptions of students enrolled in a recreation curriculum.
Do college students take their time when completing end-of-course evaluations and truly reflect upon the performance of their instructors? Simpson and Siguaw (2000) questioned students’ ability to evaluate their instructors properly and accurately. Conflicting evidence regarding the validity and reliability of students’ evaluations of their teachers has been debated for decades, yet Centra (1993) spoke to the reality of the situation. He stated, “despite discrepancies in opinions and research findings on the validity of student evaluations, it is essential for faculty to understand that [SETs] are and probably will continue to be the primary institutional measure of their teaching effectiveness” (p. 30). In light of this reality, it behooves faculty to gain a greater understanding of student perspectives regarding SETs.
College administrators and professors alike question the accuracy of SETs, requiring a clearer understanding of how students perceive SETs and how they approach these measures. More specifically, demographic variables (e.g. student gender, age, class rank, and major) may be indicative of student perceptions and potentially influence student responses on teacher evaluations. Gender, in particular, is a common demographic variable explored in various areas of research. Muñoz-Silva, Sánchez-García, Nunes, Martins (2007) maintain the effects of gender differences on perceptions and subsequent behaviors is well established. In their study, the authors utilized the Theory of Planned Behavior and the Theory of Reasoned Action models to explore how gender may influence behavior. While existing SET research explores the effect instructor gender may have on student perceptions (Centra & Gaubatz, 2000; Chamberlin & Hickey, 2001; Goldberg & Callahan, 1991; Moore, 1997; Martin, 1984); little research has been conducted on how student gender might influence the manner in which they complete evaluations of college instructors.
Due to limited research on student perceptions and the absence of research in the academic area of recreation we conducted a study to explore student perspectives regarding the uses of and rationale for SETs in a recreation curriculum. Utilizing the Theory of Planned Behavior (TpB) we examined student perceptions of SETs and how these perceptions may influence their responses (behavior) on SETs. For example, did the student perceive they had intentionally provided inaccurate responses (positive or negative) on SETs? During this investigation, relationships among student perspectives and demographic variables (gender, class rank, recreation major/minor, and age) were explored and tested for significance. The study hypothesized that these independent variables (gender, class rank, recreation major/minor, and age) would not be a factor influencing student perception of SETs. Based on the study results, gender differences yielded the most significant findings. Thus, the present investigation discusses the findings relating specifically to gender differences on student perceptions of SETs in a recreation curriculum.
Review of Literature
Beginning in the
1920s, various
forms of Student Evaluation of Teachers (SETs) were utilized in the
During the mid-1960s,
research on
the teacher evaluation process itself became relevant to those in
academia in
large part due to self-preservation. The use of SET data had begun to
play a
major role in the advancement of collegiate instructors regarding
tenure,
promotion, pay raises and job security (Centra, 1993). Given the
significance SET
data could play in the livelihood of those in academia, the usefulness
and
accuracy of these data sets is of natural interest for academic
researchers.
Over the past 25 years, the focus of the research has been on the
validity of
SETs, reaching a peak around the 1980s (Greenwald, 1997). Two
prevailing
questions were largely the focus: Are students qualified to evaluate
their
teachers and if so, are they mindful when they evaluate them? Since SET
data are
often used in the advancement of an instructor’s career and the
data is a
direct interpretation of the student opinion, it is important to know
the
answer to these questions. Some researchers strongly question the
ability of
students to provide unbiased responses on SETs (Bodle, 1994; Chonko et
al.,
2002, Simpson & Siguaw, 2000); whereas, other research supports the
validity of SET data (d'Apollonia & Abrami, 1997; Hobson &
Talbot,
2001; Odden, 2004; Overall & Marsh, 1980). While the arguments both
for and
against the validity of SETs are prevalent in the literature, the
constant
rhetoric continues without a clear victor. Notably, the majority of
universities continue to utilize the student data in making
administrative
decisions about faculty’s promise and performance (Isely &
Singh, 2005;
Read & Raghunandan, 2001). The value
of SET data and how effectively these SET ratings are used, known as
consequential validity, is the focus of McKeachie’s (1997) work.
He contended
that it is not the validity of SETs that should be the focus of
concern, but
the consistency among administrators in how the data is interpreted.
Researchers have implemented a variety of methodological approaches to unravel the factors that influence SET responses. However, SET research continues to face a great deal of scrutiny regarding the methodology, evidence, and conclusions generated by scholars because the results have varied (Chamberlin & Hickey, 2001; Feldman, 1997; Marsh, 1987; Marsh & Roche, 1997). Due to the vast discrepancy in research findings, the value of utilizing SET data in tenure and promotion decisions has fostered even more concern by those in academia (Cashin & Downey, 1992). Ultimately, the goal of the bulk of SET research has been to clarify the validity of these instruments. The majority of interest in SET data appears to stem from the magnitude this data has played in administrative decisions regarding tenure and promotion, pay raises, and security of academic appointments (Colbeck, 2002). Seldin (1980) cautioned against the use of SETs in academic decisions by stating, “We wish to emphasize that student ratings of undergraduate teaching fall far short of complete assessment of an instructor’s teaching contribution” (p. 65). If indeed university administrators are going to continue to utilize SET data, data solely derived by student opinion, which may have serious impact on an instructor’s career, it is vital to understand the student perspective of SETs and how student demographic variables may be indicative of SET responses.
Gender and SETs
Numerous researchers have indicated gender can play a role in SET responses (Centra & Gaubatz, 2000; Dukes & Victoria, 1989; Feldman, 1993; Moore, 1997; Martin, 1984); however, the focal point of these studies was on the influence of instructor gender rather than student gender. Nonetheless, the results were significant as they indicated gender could be an influential factor in SET responses. Specifically, female faculty are expected by students to be friendly and warm (Martin, 1984) and encourage questions (Feldman, 1993). Moore’s (1997) study revealed male faculty have been described by students as more scientific and knowledgeable; whereas, Centra and Gaubatz (2000) found female students in particular perceived female instructors to be more organized and possessed stronger communication skills. As with most SET research, conflicting evidence does exist regarding gender and its influence on SET responses as both Dukes and Victoria (1989) and Feldman found no differences regarding instructor gender and SET scores. However, Centra and Gaubatz noted these inconsistencies are probably due to design flaws. Feldman supported this notion as his review of 10 SET and gender studies noted a lack of control for course and discipline among the studies.
In an effort to explore the relationship between gender and SET responses, Chamberlin and Hickey (2001) surveyed 198 students (84 males, 114 females) at the end of the academic semester, using a questionnaire which consisted of items inquiring about the instructor’s highest degree received, rank and tenure status and instructor’s teaching style. The researchers found that “…female students rated male and female faculty as significantly different while male students did not differentiate between male and female faculty on [the]same items” (Chamberlin & Hickey, p. 12). Specifically, female students found female instructors to be more assertive when presenting material, sympathetic in providing feedback, sensitive to student needs, helpful with student problems, encourage classroom discussions, and less likely to make students feel inferior than male instructors (Chamberlin & Hickey). This data is somewhat disconcerting to both male and female faculty. If women are expected to have better rapport, then students may expect more from a female than a male instructor; effectively, holding a female teacher to a higher standard of rapport and vice versa for a male instructor.
While the findings from the aforementioned research are informative and signify gender may indeed play a role in SET data, student perceptions of SETs and how these perceptions may affect the responses students provide is a crucial piece of the puzzle that is still missing. Specifically, do male and female students perceive SETs differently? Additionally, recreation programs within the United States can be heavily skewed with regard to the gender of the students enrolled in their programs. For example, therapeutic recreation programs are well known to be enrolled with a much larger female student population. Specifically, a national therapeutic recreation curriculum study conducted by Stumbo, Carter, and Kim (2004) reported that on average 86.3% of therapeutic recreation students were female. Obviously, faculty are aware other programs may hold higher male student enrollments as well.
Supporting the notion that gender does influence SETs, both a comprehensive literature review by Morgan and Ogden (1981) and an empirical study by Darby (2006) maintain student gender should be considered when reviewing SET data. Building upon the review of Morgan and Ogden, Darby found that females and males responded differently on evaluations. Darby’s study examined 504 male and female students to uncover if gender differences existed in students’ evaluative abilities. In the first part of the study, Darby tested the subjects in areas linked to evaluation abilities such as notability, recall and relation to others. Males scored lower than females in all three areas. In part two of her study, Darby accounted for variables often thought to be linked to gender and SETs: she controlled for such variables as instructor gender, class size, course content, teaching skill, instructor attractiveness and evaluative method. Likert scale evaluations showed no gender differences in response to instructors, whereas, females tended to be more positive than males when provided open-ended evaluations. When both scale types were combined, females were generally more favorable in their responses than were their male counterparts. Darby’s work underscores the fact that gender differences do exist in how males and females respond to their instructors but that this pattern is not completely straightforward. Based on the information gathered in this review and the findings of our larger study on student perceptions of SETs, it is apparent that one must have both a clear understanding of the student perspective of SETs and the role gender may play. Thus, the focus of the present investigation is to explore the influence gender may have on student perceptions of SETs.
Instrumentation
The survey instrument, titled Student Perceptions of Student Evaluations of Teachers questionnaire (SPSET), was designed for the current study based upon the work of Smith and Carney (1990) and incorporated the Theory of Planned Behavior (Ajzen & Fishbein, 2005). Smith and Carney surveyed students’ perceptions of teacher evaluations in introductory psychology and education courses using a 31-item questionnaire. The SPSET questionnaire contained 52 close-ended questions including continuous and categorical measurement levels in the following five categories: (a) demographics, (b) knowledge of purposes (uses) of SETs, (c) seriousness with which students respond to SETs, (d) perceived value of SET feedback, and (e) accuracy of SET responses. The Theory of Planned Behavior (TpB) assists in the explanation of the importance of understanding student perceptions of SETs. The purpose of TpB is to predict and understand attitudes and behavior. According to the theory, behavioral intent is the vital determinant of a person's actions. A combination of subjective norms and one’s attitude toward performing the behavior makes up an individual’s intention to perform a behavior. Behavioral and normative belief, subjective norms, evaluations of behavioral outcome, and motivation to comply generate one’s attitude toward the behavior (Aizen and Fishbein). Thus, the TpB was used as the basis of the structure for the SPSET questionnaire to examine how student perceptions of SETs are influenced by their attitudes, subjective norms, perceived behavioral control, and intentions; which in turn, will ultimately influence their behavior (actual responses) on SETs.
As recommended by Aizen (2002), the questionnaire consisted of items delivered in a randomized order to avoid systematic responses by subjects. In order to test the accuracy of responses it was necessary to ask selected questions in multiple ways obtaining a reliable self-report measure; thus, several questions targeting the same information utilizing various positive and negative anchors were included. These positive and negative anchors, also known as semantic differentials, were presented in an opposite manner for each question that was repeated (e.g., definitely true – definitely false; extremely unlikely – extremely likely). Moreover, these anchors utilized verbiage in an active voice with a degree of novelty to reduce mindless, repetitive responses. To measure belief strength and outcome evaluation, responses utilized a unipolar measure seven-point optimal scale. While a five-point scale could have been used, a seven-point scale assisted with identifying variability and differences in responses. Increased answer choices allowed for a more clear examination of true differences rather than a limited or lumped choice format (Aizen). Therefore, both verbiage and visual differentiation were implemented to assist with reliability.
Pre-Testing the Instrument
The validity and reliability of the SPSET questionnaire were tested using both a panel of experts and a pilot study. The panel of experts consisted of three individuals: the Executive Association Dean who oversees the pedagogical aspects of the School of HPER, the Undergraduate Coordinator who oversees the curriculum of the recreation department, and the Director of the Center for Evaluation. The panel reviewed the questionnaire for content, reliability, and clarity of word choice. Based upon the feedback provided by the panel of experts, four questions were deleted and four questions were added with minor adjustments being made for clarity of word choice and the SPSET consisted of 52 items.
The questionnaire was then pilot tested using a sample of 58 undergraduate students in a health class consisting of recreation, health, and kinesiology majors at a Carnegie Extensive Research institution. Subjects of the pilot test were asked to identify confusing or inappropriate items, ask questions for clarification, and provide feedback regarding clarity of the items on the questionnaire. Reliability analysis for internal consistency resulted in a Cronbach alpha coefficient of .810 for the overall instrument indicating the questionnaire was consistent in measuring student perceptions of teacher evaluations. Cronbach’s alpha coefficient was also obtained for each of the outcome categories with results of: knowledge a=.673, seriousness a=.866 and accuracy a=.686. Initially, the reliability analysis of items in the value of feedback category resulted in a Cronbach’s alpha coefficient of .476. Upon removal of a question regarding the seriousness level with which instructors review SET data from this category, the Cronbach’s alpha coefficient increased to .663. Because of the dramatic change in coefficients affiliated with this specific item, it was determined that subjects may have perceived the item differently than what was originally intended by the researchers. Consequently, the item was removed from that section. By achieving an acceptable level of the Cronbach’s alpha coefficient (α = .810) with an appropriate sample size, the results of the pilot test indicated that a few adjustments to the SPSET were necessary. As a result, minor editing took place for clarity of word choice on a few of the items. Once those changes were made, the SPSET remained at a length of 52 items and was ready for distribution.
Data collection
The questionnaire was designed to capture the perspectives of students enrolled in recreation courses at a Carnegie Extensive research university located in the Midwestern region of the United States. Letters were distributed to all course instructors during the 2006 spring semester requesting about 15 minutes of class time to administer the survey. Undergraduate students enrolled in 2- and 3-credit hour recreation courses offered during the spring semester were targeted. Thirty-four out of 36 instructors gave their consent to have the survey distributed. As a result, 34 courses involving approximately 1,682 students were eligible to participate in the study. Because the probability was high that students were enrolled in more than one recreation course during data collection, students were asked to complete the questionnaire one time only. Thus, 523 subjects completed the questionnaire.
Statistical analysis
Descriptive statistics (frequencies and central tendency) were conducted for each item in the survey and used to organize and characterize the data. Chi-square Tests of Independence were conducted for each Likert scale statement in the questionnaire and analyzed to identify relationships between the predictor variables, demographic variables (i.e., gender, recreation major/non-major, age, and class status), and the criterion variable. The information gathered via Chi-square Tests of Independence indicated whether frequencies of subject responses given were greater than could be expected by chance alone. SPSET items found to be significant (p < .01) indicated the variables were related to one another. However, the Chi-square value alone merely establishes whether a significant relationship exists among variables and does not indicate the strength of that relationship nor the specific relationship among variables. Thus, once items were found to be significant, Cramér’s V statistic was used to ascertain the strength of the relationship between variables (Pett, 1997). Additionally, standardized residuals (+/- 2.33 or greater) were examined as a post-hoc analysis to determine where the significant differences emerged within the Chi-square values as recommended by Pett.
Because multiple test of significance were used in the analysis, adjustments were made for replication and accumulation of Type I error. Using the standard Bonferroni correction, this would require the alpha level be set at .002. Yet, setting the alpha at too high of a level leads to a greater likelihood of a Type II error. Lipsey (1990) argues there is justification for relaxing error risk to accommodate limits of effect size. In the current analysis, a Bonferroni correction adjustment with 20 degrees of freedom (df =20) would increase the likelihood of a Type II error. As a result, the decision was made to adjust the level of significance to .01 to reduce the possibility of a type I error, while at the same time not overcompensating to allow for a greater possibility of a Type II error.
Results
The study consisted of 523 subjects between the ages of 18 to 54 years within the following academic ranks: freshmen (16.8%, n=88), sophomores (25.6%, n=134), juniors (29.8%, n=156), and seniors (27.7%, n=145). Freshmen represented the smallest percentage of undergraduates because the recreation major is often a ‘discovery’ major, suggesting that students do not enroll in the major until their sophomore or junior years on campus. Women represented 56.2% of the sample while men represented 43.8%. Additionally, subjects were categorized as recreation majors/minors (75.2%), and non-recreation major/minors (24.8%) as not all students enrolled in recreation courses are a recreation major or minor. All demographic data is summarized in Table 1.
Table
1. Undergraduate subject demographics
Demographic |
f |
% |
Undergraduate |
523 |
100 |
Freshman |
88 |
16.8 |
Sophomore |
134 |
25.6 |
Junior |
156 |
29.8 |
Senior |
145 |
27.7 |
|
|
|
Male |
220 |
42.1 |
Female |
303 |
57.9 |
|
|
|
Recreation Major/Minor |
367 |
70.2 |
Non-Recreation Major/Minor |
156 |
29.8 |
|
|
|
Age |
|
|
18-19 |
127 |
24.3 |
20 |
147 |
28.1 |
21 |
127 |
24.3 |
22-54 |
122 |
23.3 |
Note. N=523 |
Of all the demographic data collected in this study, gender appeared to have the most significant impact on student perceptions of SETs. The data was organized and analyzed categorically as it was hypothesized there would be no relationship between gender and subject knowledge of the uses and purposes of SETs, the seriousness with which students completed SETs, subject perception of the value of their feedback on SETs, and subject perceptions of accuracy of their SET responses. What follows is a presentation of the findings in terms of knowledge, seriousness (attitude), value, and accuracy as each relate to gender.
Knowledge and gender
Chi-square Tests of Independence
indicated
gender was predictive of responses more than can be expected by chance
alone
for nine SPSET items in the knowledge of purposes and uses category
(See Table
2).
Table 2 (Continued). SPSET knowledge by
gender
|
Gender |
|
|||
|
Men |
Women |
|
||
Question Response |
n |
% |
n |
% |
x2 |
|
|
|
|
|
|
Question 14: Knew impact job performance impact response |
|
|
|
|
9.92** |
False |
17 |
7.7 |
17 |
5.6 |
|
Sometimes |
95 |
43.2 |
95 |
31.5 |
|
True |
108 |
49.1 |
190 |
62.9 |
|
Total |
220 |
100 |
302 |
100 |
|
|
|
|
|
|
|
Question 23a: Used to give instructors pay raises |
|
|
|
|
9.46** |
False |
103 |
46.8 |
146 |
48.3 |
|
Sometimes |
90 |
40.9 |
141 |
46.7 |
|
True |
27 |
12.3 |
15 |
5 |
|
Total |
220 |
100 |
302 |
100 |
|
|
|
|
|
|
|
Question 23d: Used to make improvements in courses |
|
|
|
|
18.07*** |
False |
28 |
12.7 |
15 |
5 |
|
Sometimes |
95 |
43.2 |
104 |
34.6 |
|
True |
97 |
44.1 |
182 |
60.5 |
|
Total |
220 |
100 |
301 |
100 |
|
|
|
|
|
|
|
Question 23e: Used to make improvements in teaching style |
|
|
|
|
21.74*** |
Unlikely |
39 |
17.7 |
21 |
7 |
|
Sometimes |
111 |
50.5 |
136 |
45 |
|
Likely |
70 |
31.8 |
145 |
48 |
|
Total |
220 |
100 |
302 |
100 |
|
Note. * p< .05,
** p< .01, *** p<
.001.
Chi-square results
indicated a significant but weak association between gender and
knowledge
questions presented in Table 2. Examination of the standard residuals
revealed
three items contained one or more cells at +/- 2.33 or greater.
Although the
majority of men and women indicated SETs were either not used (44.3%)
or only
used ‘sometimes’ (47.7%) in merit decisions; of those who
did indicate SETs
were used for faculty merit decisions, men were significantly more
likely to do
so than women (c2
= 9.466, p =.009,
Cramér’s V = .13). Only a little more
than half of respondents (53.6%)
perceived that SET data was used by instructors to make course
improvements. Of
those subjects who indicated instructors did not use SETs to make
course
improvements, men were significantly more likely to do so than women (c2 = 18.077, p
=.000, Cramér’s V = .18). Similar
results were found for a question regarding teaching styles. Men
indicated
their SET responses were more likely to be influenced by an
instructor’s
teaching style than women were (c2 = 21.749, p =.000, Cramér’s V = .20).
The following items did not contain
cells with standard residuals greater than +/- 2.33; however, these
items were
found to be significant and their apparent relationships are presented.
When
subjects were asked whether instructors use SETs to make course
improvements,
men and women were similar when reporting ‘somewhat’ (57.4%). However, men had a higher
percentage of reporting ‘false’ and women had a higher
percentage of reporting
‘true’ more than expected. In other words, although this
item suggests that
both genders held the perception that SETs are used
‘somewhat’ to make course
improvements, the statistical data revealed females were significantly
more
likely to believe SETs were used to make course improvements than
males. In a related item, most men and women
indicated instructors do read SET data, yet more female subjects than
expected
stated yes to this question. This could be interpreted to mean women
are more
likely to believe that instructors do care enough about students’
comments to
read them. However, women were more likely than expected to indicate
they did
not know if university officials read SET data. Three SPSET items
addressed how
increased knowledge about how SET data is used may influence responses.
The
data revealed women were more likely to agree that increased knowledge
would
influence their SET responses on all three questions.
Seriousness and gender
Chi-square Tests of Independence
indicated
gender was predictive of
responses more than can be expected by chance alone for three SPSET
items that
dealt with the seriousness with which students completed SETs (See
Table 3).
Table 3. Seriousness
by gender
|
Gender |
|
|||
|
Men |
Women |
|
||
Question Response |
n |
% |
n |
% |
x2 |
Question 11: Take time completing SETs |
|
|
|
|
9.19** |
Few to None |
42 |
19.1 |
31 |
10.2 |
|
Sometimes |
125 |
56.8 |
202 |
66.7 |
|
Most of the time |
53 |
24.1 |
70 |
23.1 |
|
Total |
220 |
100 |
303 |
100 |
|
|
|
|
|
|
|
Question 28: If classmates leave, I will too |
|
|
|
|
19.93*** |
True |
45 |
20.5 |
27 |
8.9 |
|
Sometimes |
86 |
39.1 |
104 |
34.3 |
|
False |
89 |
40.5 |
172 |
56.8 |
|
Total |
220 |
100 |
303 |
100 |
|
|
|
|
|
|
|
Question 36: I take completing SETs seriously |
|
|
|
|
16.65*** |
True |
47 |
21.4 |
112 |
37 |
|
Sometimes |
123 |
55.9 |
148 |
48.8 |
|
False |
50 |
22.7 |
43 |
14.2 |
|
Total |
220 |
100 |
303 |
100 |
|
Note. *p< .05,
** p< .01, *** p<
.001.
Chi-square results
indicated a significant but weak association between gender and SPSET
seriousness questions. Examination of the standard residuals revealed
three
items contained one or more cells at +/- 2.33 or greater. When asked
about
taking their time in completing SETs, 62.5% of the subjects indicated
that only
‘about half the time’ do they take their time completing
SETs. Additionally,
men were more likely to indicate they did not take their time when
completing
SETs than women (c2
= 9.198, p =.010,
Cramér’s V = .13). Furthermore, the
majority of subjects indicated they
would not leave the room if a classmate left the room before completing
SETs
(49.9%); however, of those who indicated they would leave the room men
were
more likely to do so than women (c2 = 19.930, p =.000, Cramér’s V = .19). Finally, of those who
indicated they took completing SETs seriously, men were less likely
than women
to do so, although the majority of subjects (51.8%) indicated they took
SETs
seriously some of the time (c2
= 16.653, p =.000,
Cramér’s V = .17).
Value of feedback and gender
Chi-square
Tests of Independence indicated no significant relationship between
gender and
items in the SPSET value of feedback category. Thus, gender was not
predictive
of responses regarding subject’s perceptions of how their
feedback on SETs was
perceived by instructors and administrators.
Accuracy and gender
Chi-square Tests of Independence
indicated gender was predictive of responses more than can be expected
by
chance alone for the items presented in Table 4.
Table 4. Accuracy
by gender
|
Gender |
|
||||
|
Men |
Women |
|
|||
Question Response |
n |
% |
n |
% |
x2 |
|
Question 10: Intend to be fair and accurate |
|
|
|
16.20*** |
||
False |
26 |
11.8 |
20 |
6.6 |
|
|
Sometimes |
98 |
44.5 |
98 |
32.3 |
|
|
True |
96 |
43.6 |
185 |
61.1 |
|
|
Total |
220 |
100 |
303 |
100 |
|
|
|
|
|
|
|
|
|
Question 16: My SET responses reflect exactly how I feel |
|
|
|
|
7.90** |
|
Disagree |
27 |
12.3 |
17 |
5.6 |
|
|
Sometimes |
108 |
49.1 |
150 |
49.5 |
|
|
Agree |
85 |
38.6 |
136 |
44.9 |
|
|
Total |
220 |
100 |
303 |
100 |
|
|
|
|
|
|
|
|
|
Question 33: Time of day can influence |
|
|
|
|
8.09** |
|
Disagree |
95 |
43.2 |
169 |
55.8 |
|
|
Sometimes |
104 |
47.3 |
111 |
36.6 |
|
|
Agree |
21 |
9.5 |
23 |
7.6 |
|
|
Total |
220 |
100 |
303 |
100 |
|
|
Question 38: Tend to give females higher marks |
|
|
|
26.75*** |
||
Disagree |
114 |
52.1 |
222 |
73.3 |
|
|
Sometimes |
96 |
43.8 |
78 |
25.7 |
|
|
Agree |
9 |
4.1 |
3 |
1 |
|
|
Total |
219 |
100 |
303 |
100 |
|
|
Note. * p< .05, ** p< .01, *** p<
.001. |
||||||
Chi-square results indicated a significant but weak association between gender and SPSET accuracy items. Examination of the standard residuals revealed three questions contained one or more cells at +/- 2.33 or greater. The data revealed men were less likely to indicate they intended to record SET responses that were fair and accurate (c2 = 16.207, p =.000, Cramér’s V = .17). Forty-nine percent of subjects (49.3%) reported that their SET responses were exact indications of their opinions only ‘sometimes’ and 42.3% indicated ‘agree’. Of those who stated ‘disagree,’ men were less likely to do so than women (c2 = 7.906, p =.019, Cramér’s V = .12). Results also depicted women were more likely to disagree they would issue high SET marks to female instructors (c2 = 26.752, p =.000, Cramér’s V = .22) than men. SPSET item 33 did not contain cells with standard residuals greater than +/- 2.33. However, this item was found to be significant and the apparent relationships are presented. Women were less likely than men to indicate the time of day a course was offered could influence SET responses, as the majority of men (47.3%) indicated that time of day a course was offered (e.g., 8:00 a.m. class) could ‘sometimes’ influence their responses on SETs.
The findings of the present investigation revealed that gender might be indicative of how students perceive SETs; thus, supporting the results of Darby’s (2006) research on gender differences in evaluation abilities. Based upon this finding, knowing the gender of students could help faculty to better explain and interpret SET response patterns for their courses. Faculty who gain greater insight of student perceptions and gender differences associated with these perceptions would likely have a better understanding of less-than-desirable SET responses (i.e., when instructors perceive and/or experience good rapport and classroom relationships with their students, yet their SET scores do not support that perception). However, pause is warranted when considering whether to collect demographic data (e.g. gender, age) as student identity could quickly be determined in courses with low enrollments. Could the benefits of collecting demographic data outweigh the costs? Perhaps, enrollment limitations could be predetermined on the collection of demographic data to avoid divulging student identity. The remainder of the discussion focuses upon the categories of knowledge of purpose, seriousness, and accuracy of perceptions by gender as significant differences were found in each of these areas.
In terms of students’ knowledge of how SET data is used, overall the students in this study knew very little about whom, beyond their instructor, would see their feedback. This finding is particularly perspicacious as instructors, especially those who teach upper level courses to juniors and seniors, may assume that students already know how SET data might be used to evaluate them in the classroom. Marlin (1987) found a similar result in a study of student perceptions of SETs when he concluded that students perceive their input on SETs have little effect upon the careers of faculty. Instructors of college courses could help themselves by sharing with their students who (e.g. department chairs, deans, etc.) reads the SET data and what happens to the information after it is read. This could help boost students’ knowledge of the purpose of SETs, which might in turn influence students’ seriousness and accuracy in completing the evaluations.
Another dimension of knowledge of the purpose of SETs was whether students perceived that their instructor read the evaluations. While women were more likely to believe their instructors actually used the feedback to improve their courses, overall the subjects in this study seemed skeptical that instructors made improvements from the student input. Again, Marlin’s (1987) study revealed a similar result when he stated students believed “nobody pays much attention nor does much as a result of the outcome of the evaluation process” (p. 714). The implication of this finding is that recreation faculty can help their students to understand the importance of SETs and how the results of these evaluations are used by sharing this information in each class.
Overall, the subjects in this study were not very serious about SETs, especially when compared to the seriousness with which the data is reviewed by administrators and peers in making faculty promotion and tenure decisions. The data revealed that only ‘some of the time’ or ‘about half the time’ do students take a serious attitude in completing SETs. Specifically, women were more likely than men to express that they attempted to complete SETs in a serious manner. Does this mean that administrators and instructors should concentrate more on the SET data provided by women? Obviously not, but this finding does imply that educators should consider obtaining more demographic data on SETs as this information could provide scores in a more accurate context of the individual marks received. Perhaps an instructor receives SET scores far beyond the scores of a peer teaching the same course in a different section. While demographic data cannot explain truthfulness or the ultimate cause behind the erratic scores, this information may lend some assistance if 86% of the students enrolled were women as opposed to other courses taught by the same instructor where the gender enrollment was 35% women and 65% men. While this example is somewhat amplified, gender is one of many demographics that could be collected and monitored on SETs. If various demographic variables were collected and monitored more closely, particularly in recreation curricula, a substantially better understanding of the student perspective of SETs and how these perspectives may influence their SET responses might be gained. Including demographic variables as a part of the SET instrument could ultimately lead to more precise interpretation of the data and how it is used for formative and summative evaluative purposes.
Do students provide accurate information on SETs? The findings of this study indicated the majority of students perceive they provided accurate and honest responses most of the time. However, some researchers argue that student expectations can influence SET responses (Andersen & Miller, 1997) which may contribute to grade inflation (Singleton, 1978; Sonner, 2000); ultimately yielding inaccurate SET data. A scenario such as receiving a lower test or assignment grade than expected might influence the accuracy of student responses, causing students intentionally to report lower scores of their instructor. Isely and Singh (2005) discovered a similar result when they found higher-than-expected grades did influence SETs. Chonko et al. (2002) concurred by stating, “The grade received in the class is likely to play a disproportionate role in the students’ evaluation of the course” (p. 279). The present investigation could not support these findings.
The evidence presented in this study would suggest in some cases students provide less than accurate SETs. In particular, men did indicate they were more likely to be influenced by an instructor’s teaching style than women. Teaching style is akin to one’s personality and these traits or skills vary among instructors. Langbein (1994) noted that faculty personality traits appear to have the largest impact on SET responses. Langbein’s work was supported by several researchers who agreed instructor personality influenced student SET responses (Ambady & Rosenthal, 1993; Cardy & Dobbins, 1986; Chonko et al. 2002; Clayson, 1999; Marsh & Roche, 1997; Williams & Ceci, 1997). As such teaching style should not be a sole predictor of an effective teacher. Thus, to issue lower SET scores based solely on one’s preference of teaching style is not necessarily indicative of poor teaching performance and is seemingly unfair. Indeed, some students may find certain teaching styles more favorable than others and some teaching styles may foster more enriched learning experiences for some students. As such, a student’s preference of teaching style is not completely objective. Pointedly, students may not like an instructor because his teaching style is ‘too challenging’ for their liking; however, this challenge can yield a greater learning experience. Receiving a lower rating because of this type of teaching style is indeed a biased result. Again, the implication of this finding is the necessity of the instructor acknowledging the importance of objective, sincere student feedback and honestly sharing with students the importance of their comments and ratings prior to the distribution of the evaluation form.
Based on the results of the present investigation, future research should elaborate on student gender differences in SET data and the impacts of said differences on SET scores should be examined further to reveal any impact this may have on the accuracy of SETs. Additionally, other demographic variables such as cultural background and ethnicity might have an influence upon how the student perceives the evaluation of their instructors. Future research should include an investigation of these variables. Further, a qualitative case study may be helpful to examine student perspectives of SETs in the recreation academic area to include a comparison of the qualitative assessment and the findings of the present investigation and assessment behind why students held distinctive and separate views regarding their actions from peers.
The data from the present investigation yielded interesting results contributing to the existing SET literature. The student perspective along with the role their demographics may play in SET responses has also been neglected among researchers. The results of the present investigation provided compelling data indicating a relationship between student gender and student perceptions may exist; and furthermore; suggested this relationship could influence student SET responses. Can we reach a better understanding about the accuracy of SET data and what, if anything, this data divulges about instructor performance? Based on the evidence presented in this study a better understanding of student perspectives and gender patterns that may exist among SET responses indeed sheds light on the complicated puzzle of student evaluations of their college professors.
Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, 64(3), 431-441.
Andersen, K., & Miller, E.D. (1997). Gender and student evaluations of teaching. Political Science & Politic, 30, 216-219.
Aizen, I. (2002). Constructing a TpB questionnaire: Conceptual and methodological considerations. Boston: UMASS Retrieved September 22 2007, from http://www.people.umass.edu/aizen/pdf/tpb.measurement.pdf.
Ajzen, I., & Fishbein, M. (2005). The influence of attitudes on behavior. In D. Albarracín, B. T. Johnson, & M. P. Zanna (Eds.), The handbook of attitudes (pp. 173-221). Mahwah, NJ: Erlbaum.
Barndt, R. J. (2001). Fiscal policy effects on grade inflation. Received April 23, 2006 from www.newfoundations.com/policy/barndt.html.
Bodle, J. V. (1994). Evaluating student evaluations: The search for perspective. Journalism Education, 49, 76-81.
Cardy, R. L., & Dobbins, G. H. (1986). Affected appraisal accuracy: Liking as an integral dimension of evaluating performance. Journal of Applied Psychology, 71(4), 672-678.
Cashin, W. E., & Downey, R. G. (1992). Using global student rating items for summative evaluation. Journal of Educational Psychology, 84, 563-72.
Centra, J. A. (1993). Reflective faculty evaluation: Enhancing teaching and determining faculty effectiveness. San Francisco: Jossey-Bass.
Centra J. A., & Gaubatz, N. B. (2000). Is there gender bias in student evaluations of teaching? Journal of Higher Education, 71(1), 17-33.
Chamberlin, M. S., & Hickey, J. S. (2001). Student evaluations of faculty performance: The role of gender expectations in differential evaluations. Educational Research Quarterly, 25(2), 3-14.
Chonko, L. B., Tanner, J. F., & David, R. (2002). What are they thinking? Students' expectations and self-assessments. Journal of Education for Business, 77(5), 271-279.
Colbeck, C. L. (2002). Integration: Evaluation faculty work as a whole. New Directions for Institutional Research, 114, 43-52.
Clayson, D. E. (1999). Students' evaluation of teaching effectiveness: Some implications of stability. Journal of Marketing Education, 21(1), 68-75.
d'Apollonia, S. & Abrami, P. C. (1997). Navigating student ratings of instruction. American Psychologist, 52(11), 1198-1208.
Darby,
J. A.
(2006). Evaluating courses: An
examination of the
impact of student gender. Educational
Studies, 32(2),
187-199.
Doyle, K. O. (1983). Evaluating teaching. Lexington, MA: Lexington Books.
Dukes, R. L., & Victoria, G. (1989). The effects of gender status, and effective teaching on the evaluation of college instruction. Teaching Sociology, 17, 447-457.
Ellett, C. D., & Teddlie, C. (2003). Teacher evaluation, teacher effectiveness and school effectiveness: Perspectives from the USA. Journal of Personnel Evaluation in Education, 17(1), 101-108.
Feldman, K. A. (1993). College Students’ views of male and female college teachers: Part I – evidence from students’ evaluations of their classroom teachers. Research in Higher Education, 34(2), 151-211.
Feldman, K. A. (1997). Identifying exemplary teachers and teaching: Evidence from student ratings. In R. P. Perry & J. C. Smart, (Eds.), Effective teaching in higher education: Research and Practice: 369-395. New York: Agathon Press.
Goldberg, G., & Callahan, J. (1991). Objectivity of student evaluations of instructors. Journal of Education for Business, 66(6), 377-378.
Greenwald, A. G. (1997). Validity concerns and usefulness of student ratings of instruction. American Psychologist, 52(11), 1182-1186.
Hobson, S. M., & Talbot, D. M. (2001). Understanding student evaluations. College Teaching, 49(1), 26-31.
Isely, P., & Singh, H. (2005). Do higher grades lead to favorable student evaluations? Journal of Economic Education, 36(1), 29-42.
Langbein, L. L. (1994). The validity of student evaluations of teaching. PS, 27, 545-553.
Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage Publications.
Marlin, J. W. (1987). Student perceptions of end of course evaluations. Journal of Higher Education, 58(6), 704-716.
Martin, E. (1984). Power and Authority in the Classroom: Sexist stereotypes in teaching evaluations. Signs: Journal of Women in Culture and Society, 24, 128-133.
Marsh, H. W. (1987). Students’ evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Education Research, 11(3), 263-353.
Marsh, H. W., & Roche, L. A. (1997). Making students' evaluations of teaching effectiveness effective. American Psychologist, 52(11), 1187-1197.
McKeachie, W.J. (1997). The validity of use. American Psychologist, 52, 1218-1225.
Moore, M. (1997). Student resistance to course content: Reactions to gender of the messenger. Teaching Sociology, 25, 128-133.
Morgan, B. B. & Ogden, G. D. (1981). Non-instructional correlates of student ratings: A brief review. International Review of Applied Psychology, 30(3), 409-427.
Muñoz-Silva,
A., Sánchez-García,
M., Nunes,
C.,
& Martins,
A.
(2007,
October). Gender differences in condom use prediction with Theory of
Reasoned
Action and Planned Behavior: The role of self-efficacy and control. AIDS Care, 19(9), 1177-1181.
Odden,
A.
(2004). Lessons learned about standards-based teacher evaluation
systems. Peabody Journal of Education, 79(4),
126-137.
Overall,
J. U.,
& Marsh H. W. (1980). Students' evaluations
of
instruction: A longitudinal study of their stability. Journal
of Educational Psychology, 72, 321-325.
Pett,
M. A. (1997). Non parametric
statistics for health care research: Statistics for small samples and
unusual
distributions. Thousand Oaks, CA: Sage Publications.
Read,
W.
J. & Raghunandan, D.V. (2001, March/April). The relationship
between
student evaluation of teaching and Faculty evaluations. Journal of
Education
for Business, 189-192.
Seldin, P. (1980). Successful faculty evaluation programs. Crugers, NY: Coventry Press.
Sidanius,
J.,
& Crane, M. (1989). Job evaluation and gender: The case of
university
faculty. Journal of Applied Social
Psychology, 19, 174-97.
Simpson, P. M. & Siguaw, J. A. (2000). Students’ evaluation of teaching: An exploratory study of faculty response. Journal of Marketing Education, 22, 199-213.
Singleton, R. (1978). Effects of grade inflation on satisfaction with final grade: A case of relative
deprivation. The Journal of Social Psychology, 105, 37-41.
Smith, M. C., & Carney, R. N. (1990, April). Students’ perceptions of the teaching evaluation process. Paper presented at the annual meeting of the American Educational Research Association, Boston, MA.
Sonner, B. (2000). A is for “adjunct”: Examining grade inflation in higher education. Journal of Education for Business, 5-7.
Spencer, K. J. & Schmelkin, L. P. (2002). Student perspectives on teaching and its evaluation. Assessment & Evaluation in Higher Education, 27(5), 397-409.
Williams, W. M., & Ceci, S. J. (1997). How’m I doing? Problems with students’ ratings of instructors and courses. Change, 29(5), 12-23.