top of page

OUR DATA CRITIQUE

Screen-Shot-2018-12-09-at-5.07.05-PM-102

Screenshot of the NYPL Menu dataset webpage.

With such a big dataset, there were some flaws that came with the large piece of information. Below is our data critique for what the dataset could have improved on, what it lacks, how we planned to use them, and more. 

The New York Public Library’s “What’s on the Menu?” transcription project features restaurant menu data spanning from the 1840s to the present. This dataset incorporates both general bibliographic information as well as the culinary and economic information from the menus. The menu dataset catalogs descriptive data that allows us to observe the historical patterns of food trends, the popularity of certain foods over time, and the price fluctuations of food items, among other patterns.

The dataset is sourced from the 45,000 menus in the Rare Books Division of the New York Public Library (NYPL). Since 2011, nearly a quarter of these have been digitized and put in the NYPL’s digital gallery. The menus in this project are first digitized and then loaded into a transcription queue for processing.  

The data from these menus are generated into spreadsheets which are updated every 1st and 16th of the month as new menus are transcribed. These spreadsheets are downloadable as four CSV files titled Dish, Menu, Menu Item, and Menu Page. The two key files are Dish and Menu. The “Dish” spreadsheet features a list of dishes, how many menus they’ve appeared on, how frequently, their first appearance, last appearance, and the highest and lowest price listings for these dishes. The “Menu” spreadsheet features data on the physical characteristics of the menu as well as information on the location, event, venue, and sponsor for the restaurant menu. All of this data has also been generated into an application programming interface (API) that is accessible to the public as well.

The spreadsheet version of the data provides an overwhelming amount of information that is rather simplified, yet with this comes a lack of details, references and connections between each of the four separate spreadsheets. The “Dish” spreadsheet assorts the frequency in which each dish has appeared in the dataset, but many of the same dishes were described differently. For example, “coffee”, “cup of coffee” and “large pot of coffee” were considered as different dishes which makes these total appearances faulty. In addition, the lowest and highest prices shown in the table do not indicate the currency. The currency column is incomplete and even miscatergorized “cents” as a currency. Plus, the prices do not take into account inflation or taxe. There is also no classification of what type of item this dish is (i.e. beverage, dessert, main course).  Additionally, because of the lack of photos in the actual spreadsheet, it becomes difficult to visualize how this data looked in the actual menu in regards to its specific display type and formation. On top of that, the menu page spreadsheet does not indicate which measurement the menu dimensions were in, as we are only given four digits. Lastly, each of the four spreadsheets also lack a connection — there is no reference of where each dish belongs to in a specific menu — it seems that the connection could only be deciphered through each item’s ID numbers.

The ideological effects of the way that the data set is organized serves an investigative purpose. For one, the ingredients and menus are organized by the year of their first occurrence. In practice, one could research the first occurrences of pizza being served within the 19th century, New York. This dataset ontology designed their data architecture set by dates in order to track the evolution of menu contents over time. In execution, this data has the capability to help chefs, scientists, and researchers track the cultural and dietary influence of ingredients in question. For instance, the database can be used to track the transformation of when Immigrant Chinese dishes began to appeal to the American palate and become the traditional Chinese take-out observed in the present. Through tracking the dish by year, one could delve in deeper by observing the menus geological location. By extension, one could also physically track a dish across a terrestrial and chronological plane through the New York Public Library’s database on menus. One could also track if an amount of a particular ingredient consumed increased or decreased. By tracking the number of common ingredients consumed, one could get an idea of what the diet was like at the time. For the purpose of our digital humanities project, we will be looking into the pollution of New York City’s lower Hudson Estuary which created a decline in oyster consumption in 1927. That decrease in consumption is, in turn, reflected in the menus saved by the New York Public Library. Hence this gives us a chance to examine cultural, socio-political, geographical and statistical changes of oyster dishes on menus over time — specifically from 1870 to 1970.

This dataset is largely built upon the contributions made by certain outstanding individuals, which might cause a relatively large amount of data gaps. For example, Miss Frank E. Buttolph added more than 25,000 menus to the collection, out of the total 45,000 items in the current collection, which is more than half of the total amount. Unfortunately, dishes and menus that were not written in (American) English would become neglected in the collection. Thus, the dataset already reveals the subjectivity of the collectors and archivists — it is collected and created to record the dishes mainly in the United States, especially in New York. Even though the menus and dishes included certain cuisines from Asia (like Chinese food), most of them are still based in America, which means that they were introduced by the immigrants and Americanized in various ways. As a result, despite its great coverage, this largest dataset of menus in the whole world, would not be an ideal guide to look into food from non-western cultures.

In conclusion, this “What’s on the Menu?” transcription project offers a highly comprehensive guide to retrace the historical dishes and menus across a large span of time. To further expand its comprehensiveness, the archivists might need to consider providing  and finding more recent and international data, and perform in-depth manual data cleaning to make the final digitized project more accurate and coherent.

bottom of page