Open Data for Development in Latin America and the Caribbean » transparency

Get involved

transparency

Data for the public good | O’Reilly Radar

Article originally appeared in strata.oreilly.com

Data for the public good

From healthcare to finance to emergency response, data holds immense potential to help citizens and government.

Can data save the world? Not on its own. As an age of technology-fueled transparency, open innovation and big data dawns around the world, the success of new policy won’t depend on any single chief information officer, chief executive or brilliant developer. Data for the public good will be driven by a distributed community of media, nonprofits, academics and civic advocates focused on better outcomes, more informed communities and the new news, in whatever form it is delivered.

Advocates, watchdogs and government officials now have new tools for data journalism and open government. Globally, there’s a wave of transparency that will wash over every industry and government, from finance to healthcare to crime.

In that context, open government is about much more than open data — just look at the issues that flow around the , including the nature of identity, privacy, security, procurement, culture, cloud computing, civic engagement, participatory democracy, corruption, civic entrepreneurship or transparency.

If we accept the premise that Gov 2.0 is a potent combination of open government, mobile, open data, social media, collective intelligence and connectivity, the lessons of the past year suggest that a tidal wave of technology-fueled change is still building worldwide.

The Economist’s support for open government data remains salient today:

“Public access to government figures is certain to release economic value and encourage entrepreneurship. That has already happened with weather data and with America’s GPS satellite-navigation system that was opened for full commercial use a decade ago. And many firms make a good living out of searching for or repackaging patent filings.”

As Clive Thompson reported at Wired last year, public sector data can help fuel jobs, and “shoving more public data into the commons could kick-start billions in economic activity.” In the transportation sector, for instance, transit data is open government fuel for economic growth.

There is a tremendous amount of work ahead in building upon the foundations that civil society has constructed over decades. If you want a deep look at what the work of digitizing data really looks like, read Carl Malamud’s interview with Slashdot on opening government data.

Data for the public good, however, goes far beyond government’s own actions. In many cases, it will happen despite government action — or, often, inaction — as civic developers, data scientists and clinicians pioneer better analysis, visualization and feedback loops.

For every civic startup or regulation, there’s a backstory that often involves a broad number of stakeholders. Governments have to commit to open up themselves but will, in many cases, need external expertise or even funding to do so. Citizens, industry and developers have to show up to use the data, demonstrating that there’s not only demand, but also skill outside of government to put open data to work in service accountability, citizen utility and economic opportunity. Galvanizing the co-creation of civic services, policies or apps isn’t easy, but tapping the potential of the civic surplus has attracted the attention of governments around the world.

There are many challenges for that vision to pass. For one, data quality and access remain poor. Socrata’s open data study identified progress, but also pointed to a clear need for improvement: Only 30% of developers surveyed said that government data was available, and of that, 50% of the data was unusable.

Open data will not be a silver bullet to all of society’s ills, but an increasing number of states are assemblingplatforms and stimulating an app economy.

Results-oriented mayors like Rahm Emanuel and Mike Bloomberg are committing to opening Chicago andopening government data in New York City, respectively.

Following are examples of where data for the public good is already having an impact upon the world we live in, along with some ideas about what lies ahead.

Financial good

Anyone looking for civic entrepreneurship will be hard pressed to find a better recent example than BrightScope. The efforts of Mike and Ryan Alfred are in line with traditional entrepreneurship: identifying an opportunity in a market that no one else has created value around, building a team to capitalize on it, and then investing years of hard work to execute on that vision. In the process, BrightScope has made government data about the financial industry more usable, searchable and open to the public.

Due to the efforts of these two entrepreneurs and their California-based startup, anyone who wants to learn more about financial advisers before tapping one to manage their assets can do so online.

 

Prior to BrightScope, the adviser data was locked up at the Securities and Exchange Commission (SEC) and the Financial Industry Regulatory Authority (FINRA).

“Ryan and I knew this data was there because we were advisers,” said BrightScope co-founder Mike Alfred in a 2011 interview. “We knew data had been filed, but it wasn’t clear what was being done with it. We’d never seen it liberated from the government databases.”

While they knew the public data existed and had their idea years ago, Alfred said it didn’t happen because they “weren’t in the mindset of being data entrepreneurs” yet. “By going after 401(k) first, we could build the capacity to process large amounts of data,” Alfred said. “We could take that data and present it on the web in a way that would be usable to the consumer.”

Notably, the government data that BrightScope has gathered on financial advisers goes further than a given profile page. Over time, as search engines like Google and Bing index the information, the data has become searchable in places consumers are actually looking for it. That’s aligned with one of the laws for open data that Tim O’Reilly has been sharing for years: Don’t make people find data. Make data find the people.

As agencies adapt to new business relationships, consumers are starting to see increased access to government data. Now, more data that the nation’s regulatory agencies collected on behalf of the public can be searched and understood by the public. Open data can improve lives, not least through adding more transparency into a financial sector that desperately needs more of it. This kind of data transparency will give the best financial advisers the advantage they deserve and make it much harder for your Aunt Betty to choose someone with a history of financial malpractice.

The next phase of financial data for good will use big data analysis and algorithmic consumer advice tools, or “choice engines,” to make better decisions. The vast majority of consumers are unlikely to ever look directly at raw datasets themselves. Instead, they’ll use mobile applications, search engines and social recommendations to make smarter choices.

There are already early examples of such services emerging. Billshrink, for example, lets consumers get personalized recommendations for a cheaper cell phone plan based on calling histories. Mint makes specific recommendations on how a citizen can save money based upon data analysis of the accounts added. Moreover, much of the innovation in this area is enabled by the ability of entrepreneurs and developers to go directly to data aggregation intermediaries like Yodlee or CashEdge to license the data.

EMC’s Big Data solution accelerates business transformation. We offer a cost-efficient and scale-out IT infrastructure that allows organizations to access broad data sources, collaborate and execute real-time analysis and drive actionable insight.

Transit data as economic fuel

Transit data continues to be one of the richest and most dynamic areas for co-creation of services. Around the United States and beyond, there has been a blossoming of innovation in the city transit sector, driven by the passion of citizens and fueled by the release of real-time transit data by city governments.

Francisca Rojas, research director at the Harvard Kennedy School’s Transparency Policy Project, has investigated the dynamics behind the disclosure of data by transit agencies in the United States, which she calls one of the most successful implementations of open government. “In just a few years, a rich community has developed around this data, with visionary champions for disclosure inside transit agencies collaborating with eager software developers to deliver multiple ways for riders to access real-time information about transit,”wrote Rojas.

The Massachusetts Bay Transit Authority (MBTA) learned from Portland, Oregon’s, TriMet that open data is better. “This was the best thing the MBTA had done in its history,” said Laurel Ruma, O’Reilly’s director of talent and a long-time resident in greater Boston, in her 2010 Ignite talk on real-time transit data. The MBTA’s move tomake real-time data available and support it has spawned a new ecosystem of mobile applications, many of which are featured at MBTA.com.

There are now 44 different consumer-facing applications for the TriMet system. Chicago, Washington and New York City also have a growing ecosystem of applications.

As more sensors go online in smarter cities, tracking the movements of traffic patterns will enable public administrators to optimize routes, schedules and capacity, driving efficiency and a better allocation of resources.

Transparency and civic goods

As John Wonderlich, policy director at the Sunlight Foundation, observed last year, access to legislative databrings citizens closer to their representatives. “When developers and programmers have better access to the data of Congress, they can better build the databases and tools that let the rest of us connect with the legislature.”

That’s the promise of the Sunlight Foundation’s work, in general: Technology-fueled transparency will help fight corruption, fraud and reveal the influence behind policies. That work is guided by data, generated, scraped and aggregated from government and regulatory bodies. The Sunlight Foundation has been focused on opening up Congress through technology since the organization was founded. Some of its efforts culminated recently with the publication of a live XML feed for the House floor and a transparency portal for House legislative documents.

There are other horizons for transparency through open government data, which broadly refers to public sector records that have been made available to citizens. For a canonical resource on what makes such releases truly “open,” consult the “8 Principles of Open Government Data.”

For instance, while gerrymandering has been part of American civic life since the birth of the republic, one of the best policy innovations of 2011 may offer hope for improving the redistricting process. DistrictBuilder, an open-source tool created by the Public Mapping Project, allows anyone to easily create legal districts.

 

“During the last year, thousands of members of the public have participated in online redistricting and have created hundreds of valid public plans,” said Micah Altman, senior research scientist at Harvard University Institute for Quantitative Social Science, via an email last year.

“In substantial part, this is due to the project’s effort and software. This year represents a huge increase in participation compared to previous rounds of redistricting — for example, the number of plans produced and shared by members of the public this year is roughly 100 times the number of plans submitted by the public in the last round of redistricting 10 years ago,” Altman said. “Furthermore, the extensive news coverage has helped make a whole new set of people aware of the issue and has re framed it as a problem that citizens can actively participate in to solve, rather than simply complain about.”

Principles for data in the public good

As a result of digital technology, our collective public memory can now be shared and expanded upon daily. In a recent lecture on public data for public good at Code for America, Michal Migurski of Stamen Design made the point that part of the global financial crisis came through a crisis in public knowledge, citing “The Destruction of Economic Facts,” by Hernando de Soto.

To arrive at virtuous feedback loops that amplify the signals that citizens, regulators, executives and elected leaders inundated with information need to make better decisions, data providers and infomediaries will need to embrace key principles, as Migurski’s lecture outlined.

First, “data drives demand,” , who attended the lecture and distilled Migurski’s insights. “When Stamen launched crimespotting.org, it made people aware that the data existed. It was there, but until they put visualization front and center, it might as well not have been.”

Second, “public demand drives better data,” wrote O’Reilly. “Crimespotting led Oakland to improve their data publishing practices. The stability of the data and publishing on the web made it possible to have this data addressable with public links. There’s an ‘official version,’ and that version is public, rather than hidden.”

Third, “version control adds dimension to data,” wrote O’Reilly. “Part of what matters so much when open source, the web, and open data meet government is that practices that developers take for granted become part of the way the public gets access to data. Rather than static snapshots, there’s a sense that you can expect to move through time with the data.”

The case for open data

Accountability and transparency are important civic goods, but adopting open data requires grounded arguments for a city chief financial officer to support these initiatives. When it comes to making a business case for open data, John Tolva, the chief technology officer for Chicago, identified four areas that support theinvestment in open government:

  1. Trust — “Open data can build or rebuild trust in the people we serve,” Tolva said. “That pays dividends over time.”
  2. Accountability of the work force — “We’ve built a performance dashboard with KPIs [key performance indicators] that track where the city directly touches a resident.”
  3. Business building — “Weather apps, transit apps … that’s the easy stuff,” he said. “Companies built on reading vital signs of the human body could be reading the vital signs of the city.”
  4. Urban analytics — “Brett [Goldstein] established probability curves for violent crime. Now we’re trying to do that elsewhere, uncovering cost savings, intervention points, and efficiencies.”

New York City is also using data internally. The city is doing things like applying predictive analytics to building code violations and housing data to try to understand where potential fire risks might exist.

“The thing that’s really exciting to me, better than internal data, of course, is open data,” said New York City chief digital officer Rachel Sterne during her  at Strata New York 2011. “This, I think, is where we really start to reach the potential of New York City becoming a platform like some of the bigger commercial platforms and open data platforms. How can New York City, with the enormous amount of data and resources we have, think of itself the same way Facebook has an API ecosystem or Twitter does? This can enable us to produce a more user-centric experience of government. It democratizes the exchange of information and services. If someone wants to do a better job than we are in communicating something, it’s all out there. It empowers citizens to collaboratively create solutions. It’s not just the consumption but the co-production of government services and democracy.”

The promise of data journalism

The ascendance of data journalism in media and government will continue to gather force in the years ahead.

Journalists and citizens are confronted by unprecedented amounts of data and an expanded number of news sources, including a social web populated by our friends, family and colleagues. Newsrooms, the traditional hosts for information gathering and dissemination, are now part of a flattened environment for news. Developments often break first on social networks, and that information is then curated by a combination of professionals and amateurs. News is then analyzed and synthesized into contextualized journalism.

Data is being scraped by journalists, generated from citizen reporting, or gleaned from massive information dumps — such as with the Guardian’s formidable data journalism, as detailed in a recent ebook. ScraperWiki, a favorite tool of civic coders at Code for America and elsewhere, enables anyone to collect, store and publish public data. As we grapple with the consumption challenges presented by this deluge of data, new publishing platforms are also empowering us to gather, refine, analyze and share data ourselves, turning it into information.

There are a growing number of data journalism efforts around the world, from  to the award-winning investigative work of ProPublica. Here are just a few promising examples:

  • Spending Stories, from the Open Knowledge Foundation, is designed to add context to news stories based upon government data by connecting stories to the data used.
  • Poderopedia is trying to bring more transparency to Chile, using data visualizations that draw upon a database of editorial and crowdsourced data.
  • The State Decoded is working to make the law more user-friendly.
  • Public Laboratory is a tool kit and online community for grassroots data gathering and research that builds upon the success of Grassroots Mapping.
  • Internews and its local partner Nai Mediawatch launched a new website that shows incidents of violence against journalists in Afghanistan.

Open aid and development

The World Bank has been taking unprecedented steps to make its data more open and usable to everyone. The data.worldbank.org website that launched in September 2010 was designed to make the bank’s open data easier to use. In the months since, more than 100 applications have been built using the data.

“Up until very recently, there was almost no way to figure out where a development project was,” said Aleem Walji, practice manager for innovation and technology at the World Bank Institute, in an interview last year. “That was true for all donors, including us. You could go into a data bank, find a project ID, download a 100-page document, and somewhere it might mention it. To look at it all on a country level was impossible. That’s exactly the kind of organization-centric search that’s possible now with extracted information on a map, mashed up with indicators. All of sudden, donors and recipients can both look at relationships.”

Open data efforts are not limited to development. More data-driven transparency in aid spending is also going online. Last year, the United States Agency for International Development (USAID) launched a public engagement effort to raise awareness about the devastating famine in the Horn of Africa. The FWD campaignincludes a combination of open data, mapping and citizen engagement.

“Frankly, it’s the first foray the agency is taking into open government, open data, and citizen engagement online,” said Haley Van Dyck, director of digital strategy at USAID, in an interview last year.

“We recognize there is a lot more to do on this front, but are happy to start moving the ball forward. This campaign is different than anything USAID has done in the past. It is based on informing, engaging, and connecting with the American people to partner with us on these dire but solvable problems. We want to change not only the way USAID communicates with the American public, but also the way we share information.”

USAID built and embedded interactive maps on the FWD site. The agency created the maps with open source mapping tools and published the datasets it used to make these maps on data.gov. All are available to the public and media to download and embed as well.

The combination of publishing maps and the open data that drives them simultaneously online is significantly evolved for any government agency, and it serves as a worthy bar for other efforts in the future to meet. USAID accomplished this by migrating its data to an open, machine-readable format.

“In the past, we released our data in inaccessible formats — mostly PDFs — that are often unable to be used effectively,” said Van Dyck. “USAID is one of the premiere data collectors in the international development space. We want to start making that data open, making that data sharable, and using that data to tell stories about the crisis and the work we are doing on the ground in an interactive way.”

Crisis data and emergency response

Unprecedented levels of connectivity now exist around the world. According to a 2011 survey from the Pew Internet and Life Project, more than 50% of American adults use social networks, 35% of American adults have smartphones, and 78% of American adults are connected to the Internet. When combined, those factors mean that we now see earthquake tweets spread faster than the seismic waves themselves. Networked publics can now share the effects of disasters in real time, providing officials with unprecedented insight into what’s happening. Citizens act as sensors in the midst of the storm, creating an ad hoc system of networked accountability through data.

The growth of an Internet of Things is an important evolution. What we saw during Hurricane Irene in 2011 was the increasing importance of an Internet of people, where citizens act as sensors during an emergency. Emergency management practitioners and first responders have woken up to the potential of using social data for enhanced situational awareness and resource allocation.

An historic emergency social data summit in Washington in 2010 highlighted how relevant this area has become. And last year’s hearing in the United States Senate on the role of social media in emergency management was “a turning point in Gov 2.0,” said Brian Humphrey of the Los Angeles Fire Department.

The Red Cross has been at the forefront of using social data in a time of need. That’s not entirely by choice, given that news of disasters has consistently broken first on Twitter. The challenge is for the men and women entrusted with coordinating response to identify signals in the noise.

First responders and crisis managers are using a growing suite of tools for gathering information and sharing crucial messages internally and with the public. Structured social data and geospatial mapping suggest one direction where these tools are evolving in the field.

A web application from ESRI deployed during historic floods in Australia demonstrated how crowdsourced social intelligence provided by Ushahidi can enable emergency social data to be integrated into crisis response in a meaningful way.

The Australian flooding web app includes the ability to toggle layers from OpenStreetMap, satellite imagery, and topography, and then filter by time or report type. By adding structured social data, the web app provides geospatial information system (GIS) operators with valuable situational awareness that goes beyond standard reporting, including the locations of property damage, roads affected, hazards, evacuations and power outages.

Long before the floods or the Red Cross joined Twitter, however, Brian Humphrey of the Los Angeles Fire Department (LAFD) was , listening. “The biggest gap directly involves response agencies and the Red Cross,” said Humphrey, who currently serves as the LAFD’s public affairs officer. “Through social media, we’re trying to narrow that gap between response and recovery to offer real-time relief.”

After the devastating 2010 earthquake in Haiti, the evolution of volunteers working collaboratively online also offered a glimpse into the potential of citizen-generated data. Crisis Commons has acted as a sort of “geeks without borders.” Around the world, developers, GIS engineers, online media professionals and volunteers collaborated on information technology projects to support disaster relief for post-earthquake Haiti, mapping streets on OpenStreetMap and collecting crisis data on Ushahidi.

Healthcare

What happens when patients find out how good their doctors really are? That was the question that Harvard Medical School professor Dr. Atul Gawande asked in the New Yorker, nearly a decade ago.

The narrative he told in that essay makes the history of quality improvement in medicine compelling, connecting it to the creation of a data registry at the Cystic Fibrosis Foundation in the 1950s. As Gawande detailed, that data was privately held. After it became open, life expectancy for cystic fibrosis patients tripled.

In 2012, the new hope is in big data, where techniques for finding meaning in the huge amounts of unstructured data generated by healthcare diagnostics offer immense promise.

The trouble, say medical experts, is that data availability and quality remain significant pain points that are holding back existing programs.

There are, literally, bright spots that suggest what’s possible. Dr. Gawande’s 2011 essay, which considered whether “hotspotting” using health data could help lower medical costs by giving the neediest patients better care, offered another perspective on the issue. Early outcomes made the approach look compelling. As Dr. Gawande detailed, when a Medicare demonstration program offered medical institutions payments that financed the coordination of care for its most chronically expensive beneficiaries, hospital stays and trips to the emergency rooms dropped more than 15% over the course of three years. A test program adopting a similar approach in Atlantic City saw a 25% drop in costs.

Through sharing data and knowledge, and then creating a system to convert ideas into practice, clinicians in the ImproveCareNow network were able to improve the remission rate for Crohn’s disease from 49% to 67% without the introduction of new drugs.

In Britain, researchers found that the outcomes for adult cardiac patients improved after the publication of information on death rates. With the release of meaningful new open government data about performance and outcomes from the British national healthcare system, similar improvements may be on the way.

“I do believe we are at the beginning of a revolutionary moment in health care, when patients and clinicians collect and share data, working together to create more effective health care systems,” said Susannah Fox, associate director for digital strategy at the Pew Internet and Life Project, in an interview in January. Fox’s research has documented the social life of health information, the concept of peer-to-peer healthcare, and the role of the Internet among people living with chronic disease.

In the past few years, entrepreneurs, developers and government agencies have been collaboratively exploring the power of open data to improve health. In the United States, the open data story in healthcare is evolving quickly, from new mobile apps that lead to better health decisions to data spurring changes in care at the U.S. Department of Veterans Affairs.

Since he entered public service, Todd Park, the first chief technology officer of the U.S. Department of Health and Human Services (HHS), has focused on unleashing the power of open data to improve health. If you aren’t familiar with this story, read the Atlantic’s feature article that explores Park’s efforts to revolutionize the healthcare industry through better use of data.

Park has focused on releasing data at Health.Data.Gov. In a speech to a Hacks and Hackers meetup in New York City in 2011, Park emphasized that HHS wasn’t just releasing new data: “[We're] also making existing data truly accessible or usable,” he said, taking “stuff that’s in a book or on a website and turning it into machine-readable data or an API.”

Park said it’s still quite early in the project and that the work isn’t just about data — it’s about how and where it’s used. “Data by itself isn’t useful. You don’t go and download data and slather data on yourself and get healed,” he said. “Data is useful when it’s integrated with other stuff that does useful jobs for doctors, patients and consumers.”

What lies ahead

There are four trends that warrant special attention as we look to the future of data for public good: civic network effects, hybridized data models, personal data ownership and smart disclosure.

Civic network effects

Community is a key ingredient in successful open government data initiatives. It’s not enough to simply release data and hope that venture capitalists and developers magically become aware of the opportunity to put it to work. Marketing open government data is what repeatedly brought federal Chief Technology Officer Aneesh Chopra and Park out to Silicon ValleyNew York City and other business and tech hubs.

Despite the addition of topical communities to Data.gov, conferences and new media efforts, government’s attempts to act as an “impatient convener” can only go so far. Civic developer and startup communities are creating a new distributed ecosystem that will help create that community, from BuzzData to Socrata to new efforts like Max Ogden’s DataCouch.

Smart disclosure

There are enormous economic and civic good opportunities in the “smart disclosure” of personal data, whereby a private company or government institution provides a person with access to his or her own data in open formats. Smart disclosure is defined by Cass Sunstein, Administrator of the White House Office for Information and Regulatory Affairs, as a process that “refers to the timely release of complex information and data in standardized, machine-readable formats in ways that enable consumers to make informed decisions.”

For instance, the quarterly financial statements of the top public companies in the world are now available online through the Securities and Exchange Commission.

Why does it matter? The interactions of citizens with companies or government entities generate a huge amount of economically valuable data. If consumers and regulators had access to that data, they could tap it to make better choices about everything from finance to healthcare to real estate, much in the same way that web applications like Hipmunk and Zillow let consumers make more informed decisions.

Personal data assets

When a trend makes it to the World Economic Forum (WEF) in Davos, it’s generally evidence that the trend is gathering steam. A report titled “Personal Data Ownership: The Emergence of a New Asset Class” suggests that 2012 will be the year when citizens start thinking more about data ownership, whether that data is generated by private companies or the public sector.

“Increasing the control that individuals have over the manner in which their personal data is collected, managed and shared will spur a host of new services and applications,” wrote the paper’s authors. “As some put it, personal data will be the new ‘oil’ — a valuable resource of the 21st century. It will emerge as a new asset class touching all aspects of society.”

The idea of data as a currency is still in its infancy, as Strata Conference chair Edd Dumbill has emphasized. The Locker Project, which provides people with the ability to move their own data around, is one of many approaches.

The growth of the Quantified Self movement and online communities like PatientsLikeMe and 23andMevalidates the strength of the movement. In the U.S. federal government, the Blue Button initiative, which enables veterans to download personal health data, has now spread to all federal employees and earned adoption at Aetna and Kaiser Permanente.

In early 2012, a Green Button was launched to unleash energy data in the same way. Venture capitalist Fred Wilson called the Green Button an “OAuth for energy data.”

Wilson wrote:

“It is a simple standard that the utilities can implement on one side and web/mobile developers can implement on the other side. And the result is a ton of information sharing about energy consumption and, in all likelihood, energy savings that result from more informed consumers.”

Hybridized public-private data

Free or low-cost online tools are empowering citizens to do more than donate money or blood: Now, they can donate, time, expertise or even act as sensors. In the United States, we saw a leading edge of this phenomenon in the Gulf of Mexico, where Oil Reporter, an open source oil spill reporting app, provided a prototype for data collection via smartphone. In Japan, an analogous effort called Safecast grew and matured in the wake of the nuclear disaster that resulted from a massive earthquake and subsequent tsunami in 2011.

Open source software and citizens acting as sensors have steadily been integrated into journalism over the past few years, most dramatically in the videos and pictures uploaded after the 2009 Iran election and during 2011?s Arab Spring.

Citizen science looks like the next frontier. Safecast is combining open data collected by citizen science with academic, NGO and open government data (where available), and then making it widely available. It’s similar to other projects, where public data and experimental data are percolating.

Public data is a public good

Despite the myriad challenges presented by legitimate concerns about privacy, security, intellectual property and liability, the promise of more informed citizens is significant. McKinsey’s 2011 report dubbed big data as thenext frontier for innovation, with billions of dollars of economic value yet to be created. When that innovation is applied on behalf of the public good, whether it’s in city planning, transit, healthcare, government accountability or situational awareness, those effects will be extended.

We’re entering the feedback economy, where dynamic feedback loops between customers and corporations, partners and providers, citizens and governments, or regulators and companies can both drive efficiencies and leaner, smarter governments.

The exabyte age will bring with it the twin challenges of information overload and overconsumption, both of which will require organizations of all sizes to use the emerging toolboxes for filtering, analysis and action. To create public good from public goods — the public sector data that governments collect, the private sector data that is being collected and the social data that we generate ourselves — we will need to collectively forge new compacts that honor existing laws and visionary agreements that enable the new data science to put the data to work.

Open 311

3-1-1 is a number well known in some cities of the United States and Canada, where citizens can notify the authorities about situations that are not urgent like non working traffic lights, illegal burning, roadway problems, etc. The goal is to leave the number 9-1-1 for those emergencies that really need immediate attention.

Open 311 “provides open channels of communication for issues that concern public space and public services. Using a mobile device or a computer, someone can enter information (ideally with a photo) about a problem at a given location. This report is then routed to the relevant authority to address the problem. What’s different from a traditional 311 report is that this information is available for anyone to see and it allows anyone to contribute more information. By enabling collaboration on these issues, the open model makes it easier to collect and organize more information about important problems. By making the information public, it provides transparency and accountability for those responsible for the problem. Transparency also ensures that everyone’s voice is heard and in-turn encourages more participation”.

Learn more

 

W3C Brazil launches Open Data portal in Latin America

Stemming from a project created in partnership with the ECLA, the website will be launched this Wednesday in Ecuador.

To contribute to the development of Open Data strategies leading to accountability, innovative services and effective public policies, thus promoting a more inclusive economy of knowledge in Latin America and the Caribbean: This is the objective of the Open Data for the Development of Public Policies in Latin America and the Caribbean (OD4D) project, implemented in partnership with the International Development and Research Center (IDRC) of Canada, W3C Brazil and the Economic Commission for Latin America (ECLA).

The OD4D portal was created as a means of providing constant updates on the project and on the progress of the global debate on Open Data. It will be launched this Wednesday, October 10, in Quito, the capital city of Ecuador during a preparatory meeting for the IV Ministerial Conference on the Information Society in Latin America and the Caribbean (eLac), promoted by the ECLA.

In order to contribute to the body of knowledge on Open Data and its potential to improve the quality of public policies in the region, the OD4D website compiles articles, documents, videos and several data on the topic. In addition to the content produced through the project (manuals, guides, scientific articles, lectures, seminars, workshops), the website foresees collaboration from society to add to its content.

According to Vagner Diniz, manager of the W3C Brazil, which will be launching the portal in Quito, the trilingual (Portuguese, Spanish and English) channel focuses on research on the impact of the use of open data on public policy-making and local economic development: “The idea is to promote debates, as well as to produce and share materials on the topic. The portal will share workbooks, manuals, reference to several portals worldwide. It will work as an aggregator, i.e. a repository of information.”

Vagner also reminds that, although the whole project is being developed in partnership with the ECLA, content management and production for the portal will be the sole responsibility of the W3C Brazil, which is already renowned for reference publication on Open Data.

About the W3C Brazil office – W3C.br
In line with the CGI.br’s deliberations and the requirements set forth by the W3C (World Wide Web Consortium), the NIC.br launched the W3C office in Brazil – i.e. the first in South America. The W3C is an international consortium aimed at promoting the realization of the Web’s full potential, by creating standards and guidelines to ensure its constant development. Over 80 standards have already been published, among which there are the HTML, the XML, the XHTML and the CSS. The W3C in Brazil supports global goals for a Web for all, from any device, based on knowledge, security and responsibility. More information available here.

About the Brazilian Network Information Center – NIC.br
The Brazilian Network Information Center (nic.br) is a civil, non-profit entity that implements the decisions and projects of the Brazilian Internet Steering Committee. The NIC.br is permanently in charge of coordinating the domain name registry – Registro.br, of studying, answering and dealing with security incidents in Brazil – CERT.br, of studying and researching network and operation technologies – CEPTRO.br; of producing indicators on information and communication technologies – CETIC.br; and, of hosting the W3C office in Brazil.

About the Brazilian Internet Steering Committee – CGI.br
The Brazilian Internet Steering Committee coordinates and integrated all Internet service initiatives in the country, promoting technical quality, innovation and awareness of the services on offer. Based on multi-lateral, transparent and democratic principles, the CGI.br represents a multi-sector model of Internet governance, effectively involving all sectors of society in its decision-making processes. One of its publications is the so-called “10 Principles of Internet Governance and Use“. More information available here.

Press Contacts:

http://www..com.brTwitter / Facebook / Youtube:
Switchboard: /
Daniela Marques – HYPERLINK “mailto:”

Vanessa Morais –

Everton Schultz –
Press Relations – NIC.br
Caroline D’Avo – Press Relations Officer –
Everton Teles Rodrigues – Communications Assistant –
Flickr

“Most of the data stored by governments is not translated into information or services to the population”

Interview originally published in Blog Públicos – Estado de São Paulo

“Governments are not really aware of the amount and nature of the data they have stored. When they do have a rough idea, they lack the time to consider how that data can be applied and converted into services for the population.”

The general manager of the W3C consortium in Brazil, an international community of 300 private and state enterprises and universities that work together to develop Web standards, Vagner Diniz maintains in his interview to Públicos that governments must allow civil society to decide which public data are of interest to the population. He also believed that both parties must join forces to make the data supply meet the demand for information.

“We cannot just sit around waiting for the government to publish information, wasting money on data that might not even be of interest to the population. We will try to identify which data can be actually useful, create a demand for it and reach an agreement with government bodies to come up with a framework of priorities,” he says.
According to Diniz, civil society can spot possibilities in the data that are overlooked by governments. “Two hundred million people will see much more than 4 or 5 million civil servants.”

Why is it important for governments to publish their data in open formats?
The amount of data gathered and not used by governments ends up creating a useless mass of information. Governments use only the portion of the data that they need for administrative purposes. Most of them are not translated into information or services for the population. Governments are not really aware of the amount and nature of the data they have stored. When they do have a rough idea, they lack the time to consider how that data can be applied and converted into services to the population.

How important is this information to civil society?

What’s most important in making this information available is allowing the population itself to say: “This set of data might interest me, it is useful to me. Let me use it because I’ll be able to come up with scenarios in which it is relevant, while you as government have too many other concerns that prevent you from seeing what I can see.” In other words, it’s the idea that two hundred million people will see much more than 4 or 5 million civil servants. With governments worldwide starting to open their data, organizations, communities, interested individuals, Web programmers and volunteers have created interesting application software to make use of the data available.

What about to governments?
Curiously, this has generated an exchange of data within governments themselves. Different government bodies now have access to information from other bodies, which was previously very difficult to obtain due to endless bureaucratic processes.

This will undoubtedly contribute to greater government efficiency. But how can we guarantee that the immense supply of data stored by governments will meet society’s demand for information?
That is a tough task which I do not expect to see easily accomplished. Reaching an ideal stage of free-flowing information from government to society will be a hard process. It will involve raising awareness. There is a lot of resistance to publishing public data because the government sees itself much more as a proprietor than a custodian of that data. Public data are public, they belong to the population, and governments are custodians of data, but they act like proprietors. They fear what will be done to “their” data. A second effort involves qualification, as publishing these data in open formats demands a certain degree of technical expertise. We have to study the technologies that allow data to be openly published on the Internet. We must train people to do this.

Now…
…lastly, there must be an open and frank dialogue between the custodians of the data, the government bodies, and those interested in having access to the data, civil society organizations and many private citizens. We will try to address priorities. We cannot just sit around waiting for the government to publish information, wasting money on data that might not even be of interest to the population. We will try to identify which data can be actually useful, create a demand for it and reach an agreement with government bodies to come up with a framework of priorities.”

You once mentioned that developing application software is much easier than gathering consistent data. Could you explain this?
Developing an application based on data available merely involves creating a code which any slightly experienced web developer can read and freely apply to his own application. It is quite simple, much like creating a Web page. You don’t even have to be a Web developer to create a Web page nowadays, thanks to the tools available. Publishing data in an open format is more complicated, given that you, as the custodian of that data, have many other concerns besides the technical aspect of making the data available. It’s about more than that…

Yes…
…you have to make sure that the data is consistent. There cannot be another dataset with information that clashes with the data being published. You will publish three, four, ten databases, and any similar information they contain cannot be inconsistent. Secondly, there are security issues you need to worry about. You cannot allow the person who will use the data to alter them in any way. Thirdly, the data being published must be certified. Because if someone happens to misuse these data and alter them in any way, and then claim to have obtained the information from a government website, you, as the publisher, can prove that the original data were altered by that person. So there are many aspects to be considered when making information available.

Can you give an interesting example of data inconsistency?
I had an experience as IT director of a city in the state of São Paulo. A typical case was the city’s streets register. Each city hall department had its own register, with data boxes tailored to the needs of each department. The finance department’s register was geared towards collecting property tax, while the register of the public roads department focused on road works. The legal department was more focused on executing outstanding debts, and so forth. I counted six or seven registers. All of them had different information about the same streets. Even worse, the street names also differed among the registers, with different abbreviations. You never knew if a street in one register was the same as in another. It was also impossible to unify these registers, as they had different formats. This poses a serious problem when the information is made available, as different registers show the same information in different ways.

This reveals not only the size of the problem, but also the growing need to standardize government information.
Absolutely. This has been critical since the adoption of information technology in the organization of corporations. The need for standardization goes way back. Professionals in the area joke that the purpose of information technology is not to help you get better organized, but to help you make the same blunders you used to do without it (laughs). When you computerize an environment without altering processes and standardizing information, you will just do the same things you did before, but more quickly.


Can the private sector benefit from open data? If so, how?

I believe so, although the private sector has not yet realized this. It can benefit greatly in many areas of the open data value chain, especially technology businesses. One example is publishing open data on the Web. Moreover, creative and innovative businesses will scrutinize the open data carefully and be able to find ways to reuse and transform these data into commercially valuable services.

Can you give an example?
Nowadays, the IBGE Census is a rich source of information. It contains a lot of data on the country, the citizens, their distribution and characteristics. If these data are made available they can be extremely useful, albeit ensuring the right to confidentiality of personal data. Based on them you could, for example, offer consultancy services for new businesses, basing it on socioeconomic profiles; and you could also give advice on which businesses are in demand based on household profiles. There is another example in operation in Brazil called Gas Finder, an application for mobile phones which allows users to locate nearby gas stations. It is extremely useful and was developed using data available on the website of the National Oil Agency. You don’t necessarily have to generate income by charging the customer directly; income may be generated from ads displayed with the information. All it takes is entrepreneurship and creativity.

Paper by Felipe Heusser. Participate!

Among the expected results of OD4D, we can include the preparation of documents that will serve as theoretical and methodological basis for the project. Those documentation, therefore, lie on producing further knowledge on Open Data and its potential to improve the quality of public policies in Latin America and the Caribbean

The idea is that the writings take into account the international literature that evaluates the institutional
context and technological conditions required for open data initiatives.

The first documentation is being held by Felipe Heusser, founder and director of Fundación Ciudadano Inteligente, a Latin American NGO based in Chile that uses information technology to promote transparency and active citizen participation. The paper Understanding Open Government Data is under construction and you can participate by making comments on the draft and thus assisting in its preparation.

You can also download the file here to read and use the PAD for comments.

The 5 stars of Open Data

When we talk about Open Datastrategies that are farther reaching than publishing information, we may introduce the concept of Linked Data into the debate or go even further: Linked Open Data (LOD).

In the words of Tim Berners-Lee, the inventor of the World Wide Web, “Linked Open Data is Linked Data which is released under an open license”. Linked Data does not always have to be open. However, Linked Open Data does. Linked Open Data may only be referred to as such if it is open. And, aiming to promote this type of data, Tim Berners-Lee suggests a 5-star rating system.

This rating system awards a star to initiatives that make information publicly available in open format. More stars are awarded progressively based on how open and accessible the data analyzed is:

? Available on the Internet (in any format – e.g. PDF), provided that under an open license, to be Open Data

?? Available on the Internet as machine-readable structured data (in an Excel file with an XLS extension)

??? Available on the Internet as machine-readable structured data and in a non-proprietary format (CSV instead of Excel)

???? All of the above and it must use W3C open standards (RDF and SPARQL): use URL to identify things, so that people can point at their publications.

????? All of the above plus: link your data to other people’s data to provide context.

 

We have reproduced below a list of the benefits of publishing data according to the 5-star rating system, both for publishers and consumers:

 

Benefits of the 5-star rating

Rating

Consumer

Publisher

?

  • you can see data
  • you can print it
  • you can store it (e.g. in your hard drive or in a memory stick)
  • you can change data as you wish
  • you can access the data from any system
  • you can share the data with anyone

 

  • publishing is simple
  • you don’t need to keep repeating that people are allowed to use the data

 

??

  • Same benefits as for one star rating
  • Proprietary software can be used to process, aggregate, calculate and view data. Data may be exported in any structured format.
  • publishing is easy

???

  • Same benefits as for two-star rating- You are able to handle data as you wish, without having to use particular software.
  • publishing is even easier

????

  • Same benefits as for three-star rating
  • you are able to leave markings
  • you are able to reuse part of the data
  • you are able to reuse existing tools and data libraries, even if these are only partially compliant with the standards used by the publisher
  • you can combine data with other data.
  • you have control over data items and you can optimize access to it
  • Other publishers may link to your data, promoting it to 5 stars

?????

  • you can uncover more linked data whilst consuming data
  • you can learn about the 5-star rating
  • you make your data easier to find
  • you add value to your data
  • your organization enjoys the same benefits of linking data as consumers

 

 

Linked Data and Open Data

Linked Data and Open Data. These terms sound similar, but refer to different concepts. In fact, Linked Data complement the Open Data movement. Ideally, the two would go hand in hand.

Linked Data is the next development from the concept of Open Data, and it requires the latter to exist. While the concept of Open Data refers to publishing information and ensuring universal access to it, the concept of Linked Data refers to connecting these data to other sets of data. Together, these two movements not only make documents available, but also provide related information that explains and describes the content, its meanings and the relationship between the data shown.

An example of Linked Data is DBPedia, which extracts information from Wikipedia and makes it available through free licenses (Creative Commons Attribution-ShareAlike 3.0 License and GNU Free Documentation License), in addition to attaching other datasets found on the Web to Wikipedia data.

Along the same lines there is also GeoNames, a free geographic database, accessible under a Creative Commons license, which makes in excess of 10 million names available.

Hence, it may be said that Open Data and Linked Data walk hand in hand towards the development of the Semantic Web, which represents “large scale integration of data available on the Web. According to Tim Berners-Lee, the creator of the World Wide Web:

“The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data”.

Watch Tim Berners-Lee’s TED Talk on the next web:

How to Open?

In order to be regarded as open, public data must be comprehensive, accessible, primary (no statistical treatment), current, machine readable, non-discriminatory (e.g. not requiring registration), non-proprietary and its licenses must ensure such principles without limiting its freedom of use.

Several publicly available data are not really open. They may have been published in proprietary formats – i.e. not readable by software – and with restrictive licenses; they may be available in HTML tables, plain text files or PDF. Developers must, therefore, translate these data, cross-reference them and publish them according to the rules and principles set forth.

Institutions that wish to open their data must prepare an activities plan. This task includes from determining which data will be published to how it’ll be published and viewed, to strategies to promote the use of such data by communities and activists.

The international movement for government data opening is based on 3 laws proposed by David Eaves:

  • If data can’t be spidered or indexed, it doesn’t exist.
  • If it isn’t available in open and machine readable format, it can’t engage.
  • If a legal framework doesn’t allow it to be repurposed, it doesn’t empower.

 

In order words, the first step towards opening data is identifying the information controlled by governments, companies, etc. Then it must be converted into a machine readable format and, finally, made accessible to all.

We have listed a series of documents below which may be used as guidelines by governments, developers and others interested in data opening processes. Check out:

 

Open Government Data and Laws on Access to Information

The initial steps towards sharing public information were taken through “laws on access to information“, first heard of in Sweden in 1766. Not long after that, in 1951, Finland proposed regulations related to the topic, followed by the U.S. in 1966. Currently, there are about 80 countries who have adopted some kind of legislation on access to information, based on two operating principles – reactive and proactive transparency.

The principle of reactive transparency sets forth that governmental bodies are legally obliged to reply to requests for public information made by citizens, usually within five to thirty working days. The principle of proactive transparency, on the other hand, requires agencies to share and publish information which has not necessarily been requested by citizens.

The nature and format of the data published vary depending on the country’s regulations, although most countries require proactive sharing of information related to institutional data, the role of public agencies, services offered, procedure rules and lists of employees and authorities. Several laws of access to information also require public sharing of budgets and public agreements, for example.

In regards to format requirements for public data sharing, it is important to stress that these laws were created before the digital age and, therefore, they aimed at sharing public documents, not the information used to prepare such documents.

Contrastingly, more recent regulations implemented in the digital age require governments to publish information online. This is the case of the Chilean law on access to information, approved in 2009, which require authorities to proactively publish on their websites up-to-date public information. The latter includes: organic structure, roles, regulatory procedures, lists of public services provided, means of access to such public services, lists of public employees and their respective salaries, among others.

Older laws of access to information, such as the Canadian and the American legislation, have been reviewed to adapt to the digital age. Hence, these revisions require information to be published online and set forth that both requests and responses to requests must be submitted electronically.

Hence, Open Government Data (OGD) may be viewed as the natural next step and legacy of the principle of proactive transparency in public information sharing, regulated by laws on access to information. In fact, both laws of access to information and OGD may be viewed independently, and not necessarily linked to each other. We may also infer that a broader understanding of information access rights – particularly when legally enforced – and the State’s duty to enforce such rights provides a more substantial and effective basis for the design of Open Government Data initiatives.

In other words, if substantiated by the aforementioned laws on access to information, open government data policies will not be based on citizens’ right or not to access to information; instead, they’ll be based on the amount and format of data available.

What is it?

 

 

What is Open Data?
According to the Open Knowledge Foundation, a non-profit organization, “open data is data that can be freely used, reused and redistributed by anyone.” It involves the publication and sharing of information online in open formats, readable by machines, which may be freely and automatically reused by society.

 

When is data regarded as open?
Data is regarded as open when there is:

  • Availability and access: data must be fully available for a reasonable reproduction cost, preferably through downloading; it must also be available in a convenient and changeable format.
  • Reuse and redistribution: data must be provided so as to enable reuse and redistribution, including cross referencing with other datasets.
  • Universal participation: anyone can use, reuse and redistribute it, without discrimination against industry, people or groups (restrictions such as “non-commercial” that prevent commercial use are forbidden, as well as limited use for certain purposes, such as “education only”).

 

What types of data can be open?
All data can be open!
There is usually interest from the following in opening data: governments, companies, activists and teaching and research institutions, for example.

 

Why open data?

Opening data enables:

  • Transparency and democratic control;
  • Population engagement;
  • Citizen empowerment;
  • Better or new private services;
  • Innovation;
  • Improved efficacy and effectiveness of governmental services;
  • Assessment of the impact of policies;
  • Uncovering new things by combining data sources and standards.

 

What about open government data?
These are information produced by governments that must be made available to all citizens for any purpose. Government data are regarded as open when they comply with the following laws and principles.

 

What are they for?
For reuse by citizens and organizations in a society to verify, clarify, inspect and monitor them, according to their interests. Opening public data strengthens institutions, enables citizenship and social control, fights corruption, promotes transparency, enables inspections and fosters new ideas for public policies from within society itself.
Citizen engagement enables the government to improve its processes and increase the transparency of public administration. This happens because the Open Government Data Available clarifies how the sectors that are still not aligned with social control and service goals work.

 

How does it work in practice?
Opening data enables, for example, creating a mobile phone application showing where the public schools in an area are located, as well as how vacancies are distributed and where the highest demand for places is; or, how public money is being spent or even public safety levels in a given municipality or neighborhood.