We at Inform believe that data tells a story, across all industries, and every week we’ll be rounding up the most interesting ones right here. This week: helping your holiday shopping; reducing greenhouse gases; and finding data fakery.
Overwhelmed with holiday shopping? An app built on big data might help.
Pulling in information from more than 10,000 sources, including “social media, major ecommerce sites, blogs, product reviews, and rankings,” the app provides the 100 most trendiest products in consumer electronics, toys, and health and fitness.
While some of the app’s findings are obvious some are less so. It’s no surprise that Star Wars LEGOS will be hot hot hot, but the app’s data says that LEGO may not be able to keep up with demand, and that those interested should buy early. The app also shows that smartphones haven’t killed digital cameras: Instagram has renewed interest in higher quality photography.
New Jersey’s Public Service Electric & Gas (PSEG) has a three-year $905 million plan to replace over 500 miles of old, methane-leaking pipes, and is using big data to help determine the most efficient way to spend their money and time.
Through a partnership with Google Earth Outreach and Environmental Defense Fund, PSEG used a Google Street View car equipped with methane sensors to collect six months worth of data from thousands of miles of roadway, and from there will make decisions around scheduling and prioritization.
Methane is a greenhouse gas, and the hope is that such an effort will improve the environment as well as safety.
You can lie but you can’t hide.
Two Stanford researchers have discovered the writing patterns of scientists who lied about their data. To do so, they identified over 200 papers that had been retracted from science journals between 1973 and 2013, and compared the writing to unretracted papers in the same journals and time frame, and about the same topics.
Next they measured the “level of fraud” in the papers using an “obfuscation index,” which rated the amount of abstract language and jargon. The researchers believed that obfuscation of language is related to fakery in general, and that a scientist trying to hide fraud might want to “obscure parts of the paper.”
The researchers found that fraudulent retracted papers scored high on the obfuscation index, i.e., each had about 60 more jargonish words than non-retracted papers.
While some might think of using data to prevent and understand crime as a modern phenomenon, it actually goes back back to at least 1889.
African American journalist and activist Ida B. Wells examined 10 years’ worth Chicago Tribune reports on lynching, and found a pattern that was surprising for the time. While many believed young black men were being lynched for punishment of rape and murder, their “crimes” were actually not crimes at all, but were reasons such as “having a bad reputation”; “writing an insulting letter”; or nothing at all.
It’s a lesson that could be heeded today. Experts say that the media often focuses on one incident that “looks good TV,” while data provides a fuller picture. On the other hand, data is rarely neutral, and should also be viewed in the context of larger conversations about race and community.
Matlab, the name of both a region and research site in Bangladesh, has been collecting and analyzing census and health data from residents for 50 years, and as a result basic health has much improved. For instance, in the 1960s children in Matlab didn’t survive into adulthood, while now more than 90% do.
Using data collected from the residents, Matlab was also able to develop and test lifesaving treatments, such the low-cost oral rehydration solution for cholera victims, which ended up saving the lives of about 50 million people worldwide, as well as zinc for childhood diarrhea.
The data also allows for retrospective study. One group of researchers wanted to understand if malnutrition in pregnant women would affect their children’s health in adulthood, and indeed found that adult children of women who were pregnant during Bangladesh’s 1974-75 famine were three times more likely to develop pre-diabetes.
Matlab is now finding that it’s those non-communicable diseases such as diabetes, heart disease, and cancer that are the leading cause of death among residents. It’s up to future Matlab generations to find the cure.