## Admission blues: How to fix GRE Mathematics and tweak the Putnam Competition

I was thinking about the Putnam competition and the GRE Mathematics test in the context of graduate admissions. Are they useful? If yes, which one is more relevant? After crunching some numbers, I concluded that while they are useful to some extend, there are problems with both. Even worse, a number of students who fall in the gap between “very good” and “exceptional”, are ill served with either.

#### 1. Graduate admissions in mathematics

As I mention in my earlier post, every year the US produces around 1,600 Ph.D.’s in mathematical sciences (math, applied math, statistics) from over 100 accredited programs, of which about 900 are US citizen and permanent residents. If you restrict to mathematics alone, the numbers drop by about 25% to about 1200. The overall 10 year completion rate is about 50% according to the Council of Graduate Schools study, so perhaps about 3,000-3,200 students start graduate programs.

As a general rule, graduate programs in mathematics explicitly ask for the GRE Subject test scores, but are often happy to hear about the Putnam results as well. In fact, some “how to” guides now recommend taking Putnam exam (and Putnam prep classes!) on par with the GRE test and REU programs (see e.g. here and there). How the schools use either data is probably quite a bit different, and is the other side of our main question.

#### 2. GRE Mathematics Subject test in numbers

The GRE Subject tests are developed and administered by ETS, which is nominally non-profit, but with about 1 billion dollars in revenue. For a quick comparison with a for-profit, non-profit and public institutions, e.g. New York Times Corp, Harvard and UCLA, had 2.3, 3.7 and 4.3 bln dollars in 2011 operating revenues, respectively.

From the official GRE test preparation publication: “The questions are classified approximately as follows: *calculus* (50%), *algebra* (25%) and *other topics* (25%).” This is already unfortunate, but more on that later. Here are these “other topics”:

Introductory real analysis(sequences and series of numbers and functions, continuity, differentiability and integrability, elementary topology ofR),discrete mathematics(logic, set theory, combinatorics, graph theory, and algorithms),general topology,geometry,complex variables,probability and statistics, andnumerical analysis. The above descriptions of topics covered in the test should not be considered exhaustive […] (emphasis mine – IP)

The GRE Guide gives .92 value for the KR20 reliability test, a solid measure suggesting the test has many questions leading to different scores between strong and weak students. The students have 170 minutes for about 65 questions. The scores are on the scale from 200 to 990, are rounded to nearest multiple of 10, with standard errors 31 points, and 44 for the differences. In other words, if I understand correctly (the guide is vague on this), one should not reliably compare students with scores differing by 50 points of less. I am doubtful most grad schools follow that.

In the same GRE guide, ETS reports that there were about 12,800 test takers in four years (July 2008 to June 2011), roughly 3200 a year. This loosely coincides with our graduate student data, as the students take on average one GRE Subject test. In other words, all students with GRE scores get accepted *somewhere*. So one should not be surprised to see a high correlation (but not necessarily causation) between grad school ranking and GRE Subject scores. Curiously, ETS’s own study says GRE General are a very poor predictor of success in math graduate programs, at least when it comes to GPA and graduation rate.

So how do grad schools use the GRE Math scores? That’s very much unclear. Of course, all schools gather the statistics like averages of those applied, admitted and/or accepted (reported to the dean, external department reviewers, the NRC study, the US News, etc.), but very few make it publicly available. In a rare moment of openness, Penn State admits what amounts to *not much use of *GRE *scores*: their average scores vary widely over the years, swinging from 650 to 890, with a positive trend in recent years. In a general MO discussion on this, Pete Clark writes that University of Georgia does not require GRE Subject, so he looks for high GRE General scores. UCLA is a bit evasive: “those we offer admission to have GRE subject scores in or above the 80th percentile” which according to GRE chart amounts to minimum of about 790, suggesting relevance. MIT is blunt but imprecise: “There is no minimum GRE test score required, but if the score on the math subject GRE is not very high, evidence of excellence must be present elsewhere in the application or in the letters of recommendation.” UPenn is actually helpful: “[GRE Math score] should be at least 750, though applicants with somewhat lower scores may be admitted if the rest of their application is sufficiently strong,” and that the recent average score is 820. This all makes a very foggy picture.

#### 3. Putnam competition in numbers

The premise is simple: first Saturday in December, 6 hours (in two sittings) to solve 12 problems in all areas of mathematics, maximum of 10 points per problem. Joe Gallian wrote a nice summary. The problems are difficult: the maximal score 120 is achieved only very occasionally, once in about 10-15 years. The median score is often either 0, 1 or 2 (out of 120!), and the mean is between 5 and 10 points. I bet it must be depressing to spend 6 hours and get no or almost no points.

The top 5 scorers are “Putnam Fellows”, another 18-20 are “in the money”, and about 50-60 get “honorable mention”. In 2011, there were “4,440 students from 572 colleges and universities in Canada and the United States”. The historical data shows that there is a clear correlation between doing well on Putnam and doing well in mathematics, which is even more pronounced for the top 25, and especially Putnam fellows.

Of course, the competition is not aimed at helping graduate admissions, as emphasized by the mid-March results date (way after the applications are due and the admission decisions are made). It does not even make the scores available in any official format. In fact, historically, it is primarily a team competition, a nerdy alternative to college athletics. Finally, a competition is not necessarily similar to do research. As Kedlaya said, “A contest problem is meant to be solved in the space of minutes or hours, whereas in research, one sometimes works on the same problem for days, months, occasionally even years.”

#### 4. A bissel of analysis

**(a)** **GRE Math.** While useful to some extend, mostly for the middle and bottom scoring students, it is largely useless for most of the better prepared students. Indeed, in the “upper middle range” of 75 to 90 percentile, the test scores range between 770 and 850, comprising about 500 students every year. By the rules of GRE, many of these students cannot be even compared. Those who can, it is unclear whether they really are better candidates for doing research and teaching in mathematics. Indeed, the excessive emphasis on calculus, real analysis and linear algebra shows the student’s ability to memorize concepts and quickly perform routine tasks. This does not test problem solving. Neither do “other topics” which are heavily testing definitions of a group, ring, metric space, etc. I bet the performance in this part strongly correlates with the quality of the undergraduate institution: better colleges offer more serious math classes, and GRE Math preparation classes, which cover these basic topics; others do not.

For the top 10%, the GRE Math scores does distinguish between them, but that’s hardly necessary. Of the top 250-300 students over half of them are international and often come with accolades like “the best student in N years from the XYZ university.” Last year I recall even one European student described as “the best student since World War II from … country”. Those 100-150 that are from the US, are well served with numerous REU programs both national and at their home universities, by the Budapest and Moscow semesters, Putnam, IMO and other competitions, etc. Their GRE scores seem irrelevant in retrospect.

Now, using AMS Classification, Group I of 48 top math graduate programs graduates about 550 Ph.D’s. All are research oriented. I am guesstimating that they must be accepting c. 800 students in total. So after the top 300 are accepted, how are they suppose to choose the next 500 if GRE is irrelevant?

**(b) Putnam. **Even though a majority receive only single digit score, there is a clear benefit for the top programs to know who the winners are. The top 25 individuals, clearly possess excellent problem solving abilities, which is useful in a number of areas of mathematics. The are multiple problems with this. First, it would be nice to have the list of winners available by December. Second, it would be nice if Putnam is offered overseas. But even for the US/Canada based students, as it stands, the senior’s performance is not counted in admissions due calendar issues. Since students often are encouraged to take their junior year abroad, the best performance they can include in their applications is from their sophomore year, which is often inferior to their senior year performance. So with exception of the truly top students, Putnam results are not used in the admissions.

#### 5. A modest proposal, Russian style

**(a) GRE Math. ** Split the GRE Mathematics into two parts. Keep Calculus/Linear Algebra in the first half, more or less in the same multiple choice form as you have now. It is clearly helpful for middle and bottom tier students and programs. For the second part, make it a no-hard-math-required problem solving style. Make many relatively simple problems, much much simpler than IMO problems, more like Moscow olympiads for the freshman-sophomore HS years (8-9th year out of 11). This would allow relatively unbiased testing of problem solving, extremely useful to mathematics programs. Both scores would need to be reported (kind of like 4 GRE General scores).

As revenue figures suggest, ETS is essentially a large utility company which does not want to rock the boat. But it has made changes before, and this particular change would be relatively painless and have the added advantage that no “other fields” need to be argued about – all students will know exactly what is the scope of the test.

**(b) Putnam. **Ugh. It’s true that “if it ain’t broke, don’t fix it“, so I don’t want to propose major changes. Just three minor tweaks, which will not change the core competition, but hopefully will make it more democratic and helpful for graduate admissions.

***** First, move the competition to late September, so the scores can be revealed before Jan 1. I really don’t see what exactly is hard about that. Perhaps, some Putnam prep classes will have to be moved to the Spring. So what?

*** **Second, open it for international students. I know, I know, time difference, language issues, etc. Whatever, keep it on the US time and only in English, as it is now. If the overseas students want to participate, they might have to do this at night perhaps (simply allow unlimited tea, coffee and Red Bull). This is still better than not giving them an opportunity at all. Another issue is trust (in foreign faculty supervisors). For that, use the technology. Reveal the problem on some website for all at once. Videotape what’s happening in all rooms where the competition is taking place. Have *all* solutions uploaded as .pdf files to the main server within minutes after the end of the competition (they should still be graded locally, with top scores re-graded at a central location). While some of this might be an obstacle for some universities in poor countries, the majority of foreign universities already have all the necessary technology to make this happen.

*** **Third, and most controversially, at least for the US/Canadian students allow an easy “parallel track”. That is, come up with substantially easier problems which can be administered at the same time in parallel. The students should be given a choice – either real problems which are hard, or easier problems which do not count. This would be good for students’ morale as a means to prevent the annual 40% of 0 scores, and the scores can be useful for admission. I am modelling this based on the widely successful Tournament of Towns, which has two levels and two tracks (harder and easier), see this problem archive.

**P.S.** Full disclosure: I took GRE Math in 1994 and received maximal score available at that time. I recall finishing early, but missing a couple of problems possibly due to some English language difficulties. I did not participate in the Putnam – was busy in Moscow. More recently, I also participated in graduate admissions, but everywhere above made sure I use only open sources and no “inside information”.

## What’s the Matter with Hertz Foundation?

Imagine you have plenty of money and dozens of volunteers. You decide to award one or two fellowships a year to the best of the best of the best in math sciences. Easy, right? Then how do you repeatedly fail at this, without anyone notice? Let me tell you how. It’s an interesting story, so bear with me.

**A small warning**. Although it may seem I am criticizing Hertz Foundation, my intention is to show its weakness so it can improve.

#### What *is* the Hertz Foundation?

Yesterday I wrote a recommendation letter to the Hertz Foundation. Although a Fellow myself, I never particularly cared for the foundation, mostly because it changed so little in my life (I received it only for two out of five years of eligibility). But I became rather curious as to what usually happens to Hertz Fellows. I compiled the data, and found the results quite disheartening. While perhaps excellent in other fields, I came to believe that Hertz does barely a mediocre job awarding fellowships in mathematics. And now that I think about it, this was all completely predictable.

First, a bit of history. John Hertz was the Yellow Cab founder and car rental entrepreneur (thus the namesake company), and he left a lot of money dedicated for education in “applied physical sciences”, now understood to include applied mathematics. What exactly* is* “applied mathematics” is rather contentious, so the foundation wisely decided that “it is up to each fellowship applicant to advocate to us his or her specific field of interest as an ‘applied physical science’.”

In practice, according to the website, about 600 applicants in all areas of science and engineering apply for a fellowship. Applications are allowed only either in the senior year of college or 1st year of grad school. The fellowships are generous and include both the stipend and the tuition; between 15 and 20 students are awarded every year. Only US citizen and permanent residents are eligible, and the fellowship can be used only in one of the 47 “tenable schools” (more on this below). The Foundation sorts the applications, and volunteers interview some of them in the first round. In the second round, pretty much only one person interviews all that advanced, and the decision is made. Historically, only one or two fellowships in mathematical sciences are awarded each year (this includes pure math, applied math, and occasionally theoretical CS or statistics).

#### Forty years of Math Hertz Fellowships in numbers

The Hertz Foundation website has a data on all past fellows. I compiled the data in Hertz-list which spanned 40 years (1971-2010), listed by the year the fellowship ended, which usually but not always coincided with graduation. There were 67 awardees in mathematics, which makes it about 1.7 fellowships a year. The Foundation states that it awarded “over 1000 fellowships” so I guess about 5-6% went into maths (perhaps, fewer in recent years). Here is who gets them.

**1)** **Which schools are awarded?** Well, only 44 US graduate programs are allowed to administer the fellowships. The reasons (other than logistical) are unclear to me. Of those programs that are “in”, you have University of Rochester (which nearly lost its graduate program), and UC Santa Cruz (where rumors say a similar move had been considered). Those which are “out” include graduate programs at Brown, UPenn, Rutgers, UNC Chapel Hill, etc. The real distribution is much more skewed, of course. Here is a complete list of awards per institution:

MIT – 14

Harvard, Princeton – 8

Caltech, NYU – 7

Berkeley, Stanford – 5

UCLA – 3

CMU, Cornell, U Chicago – 2

GA Tech, JHU, RPI, Rice – 1

In summary, only 15 universities had at least one award (34%), and just 7 universities were awarded 54 fellowships (*i.e.* 16% of universities received 81% of all fellowships). There is nothing wrong with this *per se*, just a variation on the 80-20 rule you might argue. But wait! Hertz Foundation is a non-profit institution and the fellowship itself comes with a “moral commitment“. Even if you need to interfere with “free marketplace” of acceptance decisions (see P.S. below), wouldn’t it be in the spirit of John Hertz’s original goal, to make a special effort to distribute the awards more widely? For example, Simons Foundation is not shy about awarding fellowship to institutions many of which are not even on Hertz’s list.

**2)** ** Where are they now?** After two hours of googling, I located almost all former fellows and determined their current affiliations (see the Hertz-list). I found that of the 67 fellows:

University mathematicians – 27 (40%)

Of these, work at Hertz eligible universities – 14, or about 21% of the total (excluding 3 overseas)

At least 10 who did not receive a Ph.D. – 15%

At least 13 are in non-academic research – 19% (probably more)

At least 8 in Software Development and Finance – 12% (probably more)

Now, there is certainly nothing wrong with directing corporate research, writing software, selling derivatives, designing museum exhibits, and even playing symphony orchestra or heading real estate company, as some of the awardees do now. Many of these are highly desirable vocations. But really, was this what Hertz had in mind when dedicating the money? In the foundation’s language, “benefit us all” they don’t.

I should mention that the list of Hertz Fellows in Mathematics does include a number of great academic success stories, but that’s not actually surprising. Every US cohort has dozens of excellent mathematicians. But the 60% drop out rate from academia is very unfortunate, only 21% working in “tenable universities” is dismaying, and the 15% drop out rate from graduate programs is simply miserable. Couldn’t they have done better?

#### A bit of analysis

Every year, US universities award over 1,600 Ph.D.’s in mathematical sciences, of which over a half go to US citizen (more if you include permanent residents, but stats is not easily available). So they are choosing 1.7 out of over 800 eligible students. Ok, because of their “tenable schools” restriction this is probably more like 300-400. Therefore, less than half of one percent of potential applicants are awarded! For comparison, Harvard college acceptance rate is 10 times that.

Let me repeat: in mathematics, Hertz fellows drop out from their Ph.D. programs at a rate of 15%. If you look into the raw 2006 NRC data for graduation rates, you will see that many of the top universities have over 90% graduation rate in the math programs (say, Harvard has over 91%). Does that mean that Harvard on average does a better job selecting 10-15 grad students every year, while Hertz can’t choose one?

Yes, I think it does. And the gap is further considering that Hertz has virtually no competition (NSF Fellowships are less generous in every respect). You see, people at Harvard (or Princeton, MIT, UCLA, etc.) who read graduate applications, know what they are doing. They are professionals who are looking for the most talented mathematicians from a large pool of applicants. They know which letters need to be taken seriously, and which with a grain of salt. They know which undergraduate research experience is solid and which is worthless. They just know how things are done.

Now, a vast majority of Hertz interviewers are themselves former fellows, and thus about 95% of them have no idea about the mathematics research (they just assume it’s no different from the research they are accustomed to). Nor does the one final interviewer, who is an applied physicist. As a result, they are to some extend, flipping coins and rolling dies, in hope things will work out. You can’t really blame them – they simply don’t know *how to choose*. I still remember my own two interviews. Both interviewers were nice, professional, highly experienced and well intentioned, but looking back I can see that neither had much experience with mathematical research.

You can also see this lack of understanding of mathematics culture is creeping up in other activities of the foundation, such as the thesis prize award (where are mathematicians?), etc. Of course a private foundation can award anyone it pleases, but it seems to me it would do much more good if only some special care is applied.

**A** modest** proposal**

There is of course, a radical way to change the review of mathematics applicants – subcontract it to the AMS (or IMA, MSRI, IPAM – all have the required infrastructure). For a modest fee, the AMS will organize a panel of mathematicians who will review and rank the applicants without interviewing them. The panel will be taking into consideration only students’ research potential, not the university prestige, etc. The Hertz people can then interview the top ranked and make a decision at the last stage, but the first round will be by far superior to current methods. Even the NSA trusts AMS, so shouldn’t you?

Hertz might even save some money it currently spends on travel and lodging reimbursements. The 13% operating budget is about average, but there is some room for improvement. Subcontracting will probably lead to an increase in applications, as AMS really knows how to advertise to its members (I bet you currently receive only about 40 mathematics applications, out of a potential 400+ pool). To summarize: *really*, Hertz Foundation, think about doing that!

**P.S.** It is not surprising that the 7 top universities get a large number of the fellowships. One might be tempted to assume that clueless interviewers are perhaps somewhat biased towards famous school names in the hope that these schools already made a good decision accepting these applicants, but this is not the whole story. The described bias can only work for the 1st year grad applicants, but for undergraduate applicants a different process seems to hold. Once a graduate school learns that an applicant received Hertz Fellowship (or NSF for that matter), it has every incentive to accept the student, as the tuition and the stipend are paid by the outside sources now.

**P.P.S.** Of course, mathematicians’ review can also fail. Even the super prestigious AIM Fellowship has at least one recipient who left academia for bigger and better things.

**UPDATE** (April 15, 2019). Over the years since this blog post, I have been contacted by people from the Hertz Foundation board. I have also followed up on the story and the recent fellowship recipients. I am happy to say that the foundation implemented various important changes vis-à-vis math interviews, to the visible effect. At the moment, the numbers are too small to report statistics and the changes I know are not a public information. I concluded that my criticism no longer applies, a happy ending to the story. I encourage now everyone to support the foundation financially as well as recommend your best students to apply.