Not Big Enough For ‘Big Data’

Advertisement

On May 1, after a 90-day review ordered by President Obama, the White House released a 79-page report on "Big Data", a catchphrase that has become exceedingly popular over the last several years to refer to the copious amounts of information about us and our habits that are being collected, stored and analyzed with new digital technologies. It's why Amazon tried to sell me a 5-pound box of matzah when I purchased a new seder plate a couple months ago. The theory is that if we collect enough information, and have the will and the expertise to properly analyze it, then we'll be able to create all sorts of new efficiencies, improve our behaviors and make better decisions for ourselves and our organizations.

Some Jewish organizations, attempting to keep up with what they see as "cutting edge" trends, have bought into this line of thinking. The Foundation for Jewish Camping hosted a conversation at their recent leader's retreat on "how to use big data techniques to acquire new campers and maintain existing relationships with your current staff and valuable alumni." Gordon Hecker, the executive director of the Columbus Jewish Federation has been quoted as saying, "Big Data is the way all large businesses are going. The Jewish community can hop on this train now or get left in the dust." And only a day after the release of the White House Big Data Report, the Jewish Federation of Greater Atlanta announced that they would be "embarking on an innovative 3-year capacity building program for the Jewish community" by using GrapeVine: "a new engine for the non-profit community to address its need to move into the Age of Big Data."

I am a huge proponent of data and research. Jewish organizations on the whole need to be more data driven. I believe in the value of large-scale communal surveys like the Pew Portrait of Jewish Americans, or the National Jewish Population Studies or local community surveys like the 2011 UJA-Federation Jewish Community Study of New York. But these studies are different from Big Data science. The former are based on years of refinement of techniques, specifically framed research questions and classic statistical models. The latter tends to grab up as much data as possible from wherever it can and then use computer algorithms to seek out correlating trends.

This is often not as powerful as it seems. In 2009, Google Engineers were able to track the spread of the flu virus simply by correlating what people searched for online, and whether they had flu symptoms. This got them enormous accolades in press, including a publication in the prestigious journal Nature. However, only a few years later, the algorithm stopped working. In 2013, Google Flu Trends overstated the number of cases in America by nearly a factor of two. Google didn't know, nor care, what linked the search terms with the spread of flu. They were simply finding statistical patterns in the data. The same goes for Amazon, who while correctly suggesting that I might want a matzah with my seder plate, also encouraged me to purchase this handy-dandy Messianic Passover Seder Preparation Guide.

Big Data tempts us to believe that we can see everything from 30,000 feet. But often it just leads to apophenia: seeing patterns where there aren't any. This is a particularly common danger with a small sample set, because it creates the impression of patterns where there really aren't any.

To do Big Data right, you really need an enormous amount of data. Netflix, for example, has 76,897 movie micro-genres built by a large group of people specially trained to watch movies and tag them with all kinds of metadata in a process so sophisticated and precise that taggers receive a 36-page training document telling them how to do so. Pair that with an audience of over 33 million users each with an average of 200 reviews, and you can begin do big data analysis. By way of contrast, the 2013 Pew Study is based on the answers of merely 5,191 respondents. Even if we were to merge every dataset in the Berman Jewish Policy Archive, our data still wouldn't be "Big" enough to glean any meaningful findings via these new techniques.

Like me, Harper Reed, the chief technical officer for Barrack Obama's data-driven 2012 re-election campaign, is a big believer in the power of data and data analysis. Yet, nonetheless, he has been quoted more than once describing Big Data as "male bovine feces." In his own words, "The 'Big' there is purely marketing. This is all fear… This is about you buying big expensive servers and whatnot." (You can watch him here. Start at about 14:20). 

Data quality matters inordinately more than quantity, and the right analytical framework trumps anything else. Data, no matter how "Big," cannot represent all of the complexities of our work and our beliefs and we cannot simply be slaves to an algorithm.

Russel Neiss is a Jewish educator, technologist and activist, and the coding monkey behind PocketTorah, The AlephBet App and a myriad of other Jewish ed tech initiatives.

Advertisement