An Interview with Judah Diament
After graduating Yeshiva University in 1996, Professor Judah Diament simultaneously attended graduate school at NYU’s Courant Institute of Mathematical Science and learned in the semikha program at RIETS. He then went to work at Hitachi Data Systems followed by Bell Atlantic for a year. Diament then spent thirteen years at IBM Research, eventually finding his way to Goldman Sachs. Finally, this past summer Diament became the co-chair of the Computer Science department with Dr. Kelly and the Program Director of the new Undergraduate Data Science program.
Dovid: Why did you want to leave industry and come back to Yeshiva?
Judah: The ultimate objective of everything that happens in a publicly traded corporation is pushing up the company’s stock price. Of course earning a living is an important religious obligation and as such can be imbued with meaning and value regardless of what job one finds oneself in, but it is still not the same as doing something that is inherently meaningful. As a teacher, by contrast, your primary job is chessed, i.e. helping students learn a field and get their career started the right way. As such, being a teacher has more inherent value and is therefore very attractive. There are side benefits to teaching in Y.U. as well. First, I get to be in the environment of Yeshiva and daven and learn in the beis medrash. Second, from a scientific perspective, in most corporations a computer scientist will largely deal with problems that have been solved previously and re-solve them with a practical twist that is required to meet the demands of the business. By doing so you can impact the business significantly, but it's not as interesting as trying to solve unsolved problems, which is really what doing research is about. As a professor, there is time over the course of the year to do research.
D: What is the new Data Science program going to be? Is this department/specialization going to be part of the computer science major, or will it incorporate other departments and majors?
J: First let’s talk about how data science is used in industry, since the program is being designed to position different students to succeed in different ways. There are three different types of roles one can play in applying data science to industry. At one extreme are businesspeople who want to gain insight from their data to solve their business problems; at the other extreme are mathematicians who create models that can extract useful information from data; in between the two extremes are computer scientists who know how to take the models that the mathematicians created and make them work consistently and at a very large scale in a way that meets the needs of a business. For example, a mathematician will be able to create a new pricing model for a financial firm, but if you need to run it 50,000 times a second and differ its behavior based on which user is invoking it, then you will need a computer scientist to build a scalable and reliable system around the model. In order for the computer scientist to build such a system he must understand the mathematics well enough to know what the challenges are to make it scalable and reliable, and as such his education and training must include enough of the relevant areas of math.
Let’s return to the ultimate consumer of all this math and technology, the business user, who wants to leverage it all quickly and easily to discover useful information to propel his business forward. Perhaps the most common approaches to giving a businessperson access to such things is to create a dashboard, which is a web app that allows a user to configure certain parameters of a computation and then displays a graph or table that describes the results. And, very often, the user can export the results in a form that would allow him to manipulate in it in Excel. This approach is extremely limiting, however, both because the businessperson has to go back to the IT department every time he needs something new or different, and also because there is a limit to what you can do well in Excel. There is a need, therefore, to teach future business users enough programming and statistics to allow them to be more proactive and independent consumers of mathematical models and data. These users will not create fundamentally new models that no one ever thought of before, nor will they be able to build a large scale system. However, they should not be dependent on the IT department to make controlled modifications to how data is sampled or consumed in order to glean new business insights.
Now we can understand why this program is not a new major but instead has to span three majors, BIMA, Computer Science, and Math. Dean Avi Giloni has already been building this up on the business side in the BIMA major, and a lot of the courses are already in place that teach them the skills they will need, such as programing in R and how to correctly sample data. On the computer science side, we are adding two specialization tracks to the traditional core C.S. classes that we (and all C.S. departments across the country) have always taught: Distributed Systems and Data Science. In the Distributed Systems track, students will learn the parts of C.S. that make cloud computing, large scale data science, and mobile apps and games that communicate with a server all possible. This is the technology/science that all of the newer tech companies – Google, Amazon, Facebook, Netflix, LinkedIn, etc. – depend on, both for their clouds as well as for doing data science at a large scale. In the Data Science track, students will learn enough about distributed systems to understand how they work and also enough additional math to understand how things like machine learning or statistical models written in R work. These will be the students who know how to take the mathematical models and make them run correctly on large distributed systems. In terms of new C.S. courses in these tracks, Professor Kelly will be teaching Data Visualization this spring semester and I’ll be teaching a two part Distributed System course next year. We are also in the process of searching for an additional full-time C.S. professor to teach machine learning, and the math department is looking for an additional full-time professor to teach statistics and probability, i.e. the mathematical side of data science. The program will not be all built overnight, but over the next year and a half we hope to finalize the curriculum and roll out the various pieces.
D: Will BIMA students be able to take the new C.S. classes?
J: To take any of the new C.S. classes you would have to take Data Structures, Calculus, and Intro to Computer Science, at a minimum, and some will require Algorithms as well. Any BIMA student who takes those background C.S. courses could take these new C.S. courses as well. The first semester of the Distributed Systems course would be very productive for a BIMA major because it will be about the different types of distributed systems, the challenges they each address, and how you use them. This knowledge would be very useful for a BIMA major, but in order to understand the course, one will need a background in the basics of computer science.
D: For those who aren’t familiar with the history, when did computer science go from being the basic fundamentals that you mentioned are taught all over the country to now being a more complex, specialized field?
J: Although there have been subspecialties in computer science research going back at least to the 1970s, as recently as the mid-1990s one could learn the core of computer science in college and then go out and have a very successful career. By the early part of this century, though, companies became savvier in how they used technology and more ambitious regarding what they wanted to get out of their investment in IT. To meet those ambitions, they needed to hire people with deeper expertise in various subspecialties of C.S., and thus the specialization that almost always existed in the research lab extended out into industry.
D: Does everyone in the computer science major need to choose one of these tracks, or can they do the old core classes and not add this specialization on top of it?
J: We are not sure yet. We are still fleshing out what the tracks will actually look like and then have to decide if we will allow a student to not specialize.
D: In order to have these tracks in computer science, will students need to take a whole set of extra classes in addition to those currently required by the computer science major?
J: It will be roughly the same core classes, and the additional classes for these tracks will mostly replace other computer science electives. Nothing is written in stone yet, though, and it certainly could still be that the total number of required classes will increase.
D: What has been your role on the administrative side?
J: In building the program I have to look at the courses that will be in each track, what other top universities’ programs look like, what the market needs, etc. To help the program succeed, I put together an industry advisory board. This board includes, amongst others, an executive from IBM, the chief IT architect from Mayo Clinic, a manager of a data science team from Facebook, a research scientist from Amazon, a vice president from Goldman Sachs, the manager of a large distributed systems team at Google, and C-level executives from a number of startups. We are getting input from them as to what this program should be in order for our students to get the best jobs. I’ve had one-on-one conversations with many of them, and hopefully next semester we will sit down as a group to review and discuss all the details.
D: People have not been pleased with the Computer Science department in recent years. Do you think this program will make the Computer Science department more attractive and help bring up its standard?
D: So when is the official start date of the program?
J: The official start date is Fall 2017, but this Spring Dr. Kelly will already be teaching Data Visualization, a new course which will be part of the data science track.