Open Data for Development in Latin America and the Caribbean » Open Government Data

Get involved

Open Government Data

Management of Open-Data Cycle

The cycle involving open data, whether governmental or not, is large and needs to be observed to ensure that publishing initiatives and open data are sustainable. In other words, for the benefits promised by scholars who promote the open data philosophy to become something concrete to humanity, it is necessary to observe all points involving the publication of open data on the Web.

Below is a chart to illustrate better the relationships between the main stages involved in data publishing.

cadeiaDadosAbertosGerenciamentoCicloEn

(this graphic is published in an open vector format – svg – on GitHub)

According to the graph, the cycle that given data follows may begin from the COLLECTION stage.
An information society age with hardware devices available for relatively affordable prices offers an abundance of possibilities for information sources to be creative when accumulating data that may serve to benefit people in the near future.
Collection Situations
Spontaneously provided forms, sensors, and data and others may be used to establish interesting bases. It is important to note that monitoring should not rely on people, but on the environment in which they live. If any information depends on a human to be provided, it should always be protected by privacy laws, and contributing should be unenforced and free, with no obligations. The advantage of designing data collection in a given situation is the possibility of providing facilitation tools for in all other phases. This means that if the collection solution is planned, it can contain important elements, such as semantic web resources and accessibility that can save labor (particularly in use and reuse phases) and innovatively qualify the database in question.

A good example of this is the base used for the “Open Self Medication” application, which combines semantic databases on drugs, their compositional substances, and symptoms for each disease, relating each case in different ways that can, for example, decrease the error at the moment of prescribing a drug.

Extraction Situations
Extraction situations must be performed in closed-data publishing. Almost always these are closed and proprietary or unstructured formats with fields that are either missing or containing erroneous or useless information. It is no wonder that the process for making such data usable and analyzing it is called mining. Likewise, the process of extracting public data from closed or limited-access databases via coding activities is known not only as hacking but also as web scrapping.

Several methods and tools exist for extraction. The DadosGovBr GitHub contains a repository of very useful tools that documents some related tools.

Storage/Publication/Distribution of Open Data
Once one has the data in hand, if the intention is to publish or republish, they must be stored in repositories structured and designed to receive and distribute such data in an open and interoperable manner. These are the catalogs.
Forming consistent catalogs requires some basic rules that were initially defined to catalog library content. With the advent of the information age, they were transposed and adapted to the data catalog formation context. Users can utilize catalogs to refine searches or help in interpreting entries. According to the Brazilian Internet Steering Committee:

The semantics of information must be agreed in advance, so that all parties have a common understanding of the meaning of the data exchanged. At the international level, this may be a complex issue, since certain legal concepts differ from one country to another. The ultimate goal is to be able to interpret data evenly between the different platforms and organizations involved in data exchange. To do this, it would be useful to publish on the Web the names and definitions of the elements used in a shareable and referenceable format, regardless of the degree of support obtained.

Semantic qualification of data can add much to the chosen bank. Thesauruses, taxonomies, vocabularies, classification schemes, among others, are resources used to produce 5 star data. Selecting or constructing these tools requires fully understanding some of the Web standards and tools for data catalog storage and publishing.
One of the most frequently used tools for viewing catalogs is CKAN, which is also a publishing, storage and management tool for data sets. CKAN is free software, developed and maintained by a community, which means it has no cost and a very positive learning curve. Dados.gov.br currently uses this software to maintain the Brazilian government’s open-data portal
Regardless of the software suite adopted for storage and publishing, it is important to include in the open-data cycle planning certain concept standards. They are (taken from here)

  • URI: a resource identifier used to identify or locate something on the Web
    • A URL is a URI that identifies a resource and provides a means to act on it, obtain and/or represent this resource, describing its primary access mechanism or location on the “web”. A URL is a URI that identifies a resource and provides a means to act on it, obtain and/or represent this resource, describing its primary access mechanism or location on the “web”.
      For example, the URL http://www.w3c.br/ is a URI that identifies a resource (the W3c Brasil website), represents this resource (the HTML page, for example), and is available via HTTP from a network host (http://www.w3c.br).
    • Below is a diagram showing the structure of a URI (Taken from this site).

uri schema

 

 

  • RDF/XML: XML is a W3C standard format for creating documents with data organized in a hierarchical fashion, as often seen in formatted text documents, vector images, or databases.
  • SPARQL: “sparkle”, also recommended by the W3C and administered by the W3C Semantic Web Groups, is used to search for information independent of the format of the results. One can also use SPARQL to work with data in RDF.

There are standards for publishing data in open format. To provide an interoperable environment in all e-gov domains, it is imperative that laws and/or governmental recommendations specify and regulate these standards. When data is dynamic, interoperable, and fed systemically, costs are reduced and processes are incorporated in administration routines (such as completing tables in Excel). In practice, this means that planning may be slow and expensive, but it also reduces the cost of maintaining a sustainable environment.
The Importance of APIs
When it comes to large volumes of dynamic and open data, the best way to plan to open one’s data is to include a conversation on APIs and seriously consider using them. APIs are intended to be used as an interface by software components to communicate with each other.
API stands for Application Programming Interface. An API is a set of predetermined programming rules that enables creating applications that use these rules to obtain data in layers that do not appear to the average user. They connect and continue “working”, interoperating multiple systems and applications when data is requested. APIs should be open and transparent so that developers may access them and suggest new features to improve their applications.
The presentation that accompanies this part of the course is available for download here and available to read online.
Use / reuse
Continued in the next post, on visualizations and applications.

Open Data is Hot Topic at the W3C Brazil Conference

The city of São Paulo hosted on October 18-20 the 4th Web.br Conferece – an event promoted by the W3C Brazil office to debate the future of the Web – and Open Data was one of the hot topics debated.

According to the manager of W3C Brazil, Vagner Diniz, debating Data opening is paramount for the Web’s progress. Hence, the topic was part of several programming activities, such as panels, lectures, coffee break chats, as well as during the hackathon:

“There is an ever increasing number of devices capable of connecting to the Internet. The connection of several different types of devices to the Internet only makes sense if such devices can communicate with each other – i.e. if they can exchange information between them -, so that this data sharing enables better use of each of the devices connected. When we talk about an open Web, which was the theme of this Conference, we are talking about a Web that comprises these devices connected. And by talking about a Web that comprises these devices connected, we are referring to Open Data. For it is paramount to have data capable of trafficking from one place to the other seamlessly or data that enables me from my device to access data in a different device, thus enriching my Web experience”.

With a full room at 9 a.m., Jeanne Holm’s (Data.Gov evangelizer and Chief Systems Architect at NASA’s Jet Propulsion Laboratory) talk presented the U.S. experience opening its data and its impact on citizens’ lives. In an exclusive interview, Jeanne Holm said that the U.S. Government’s focus in regards to this topic is on how to provide more data, information and services to citizens, so as to enable them to make better decisions daily.

According to her, the government’s Open Data initiative involves 180 agencies, which have already provided access to 400 thousand databases.

“What is interesting about this is that when developers come together, such as in an event like this Conference today, they get their hands on these data and create applications or websites, or data journalists analyze them and help understanding what those data mean.”

Here you can watch the whole interview:

Another highlight of the Conference was an announcement by the Ministry of Justice confirming its first publication of data on the website dados.gov.br. According to Francisco Carvalheira, Coordinator of the Ministry of Justice’s Transparency and Access to Information Program, the institution decided to open its database of customer complaints received through Procons (Customer Protection Agencies) across the country. A “substantiated complaint” is an administrative procedure foreseen by the Customer Protection Code that represents 15% of the complaints registered by Procons.

“We believe that society will be able to come up with potential uses for this database. We believe that by publishing this database in open format we’ll be contributing to the actual Customer Protection public policy.”

The announcement was made by the Ministry of Justice during the Panel: “How to make the most of the Access to Information Act“. During this presentation, Francisco Carvalheira told that the institution has so far received 2,047 requests to access the information.

Also, in order to ensure the practical aspect of the debates, the Web.br created a space for journalists, programmers and webdesigners to work together in existing databases to produce information. During the Decoders hackathon, Open Data cases were presented and application templates were created using public data bases.

Zeno Rocha is a developer and he tells that him and a friend created a game specially to be presented at Decoders and motivate participating developers. According to him,  is an application aimed at providing young Facebook users information on politicians in a fun way.

The developers Kako and Rafael, on the other hand, saw their “Transpolitica” project win the Hackthon. This was the first time they both worked with Open Data.

Like Hurricane Maps? Thank Open Government Data Nerds | Techcrunch

Article originally published in Techcrunch.com

As Hurricane-battered East Coasters turn to online crisis maps for weather updates and evacuation notices, we should all take a moment to give kudos to the spreadsheet nerds who advocated opening up the very government data reserves that now fuel these online tools.

From Google’s hurricane hub to the The New York Times evacuation map, life-saving online tools draw from a recent and relatively underfunded set of government programs that release information in ways conducive to third-party developers. “Open data is critical in crisis situations because it allows government to inform and serve more people than it ever could on its own through conventional channels. By making data freely available in a usable format for civic-minded developers and technology platforms, government can exponentially scale its communications and service delivery,” New York City’s Chief Digital Officer, Rachel Haot, writes to TechCrunch in an email (hopefully from a safe place).

The small but tenacious open data movement is based on a faith that citizens can build amazing, yet unknown tools with the vast reams of data warehoused on government servers. “We are enabling entrepreneurs and innovators across all walks of life to tap into fields of data sitting in the vaults of government in machine-readable form,” said Todd Park, President Obama’s Senior technology advisor. They’ll “create all kinds of services and products that we can only even barely imagine.”

It was President Reagan that originally released Global Positioning System (GPS) data in response to a downed airliner that accidentally wandered into Soviet territory; yet he never could have foreseen that GPS would eventually power an entire industry and smartphone and automobile navigation products.

In between national crises, open data advocates are relegated to the lowest totem poll of government priority. Afterall, in the midst of a crippling recession and ongoing trillion-dollar foreign wars, paying the salaries of programmers to transfer private data onto public spreadsheets is a tempting program to put on the chopping block (and is therefore constantly under defunding threats). When open data is attached to partisan lightning rodes like healthcare, it can evoke the wrath of small government pundits.

But thanks to their faith in the power of liberated data, East Coasters are a bit safer (and the rest of the world has cool products like Google Maps). So the next time you read a story about a programmer ferociously demanding open data for some seemingly obscure government service, like about parking meters, comment at the bottom of the article with a simple “thanks.” You never know how the fruit of his labor will affect you or your loved ones.

 

Web.br Conference debate on Open Data

The 4th Web.br Conference began this Thursday, in São Paulo. The aim of the event is to uncover new paths and steer the debate on the future of the Web; highlighting its most relevant topics and discussing how to universalize it according to W3C’s principles: Web for all, from any device, anywhere and in any language or culture.

Brazilian and foreign experts will talk about HTML and CSS3, Web Accessibility for disable people, Semantic Web and Data Viewing. During the three-day Conference, the Open Data movement will also be discussed, as well as its impacts on the Access to Information Act.

On Saturday, the Conference will give way to the Decoders W3C Brasil, a collaborative event with hackers (hackathon). “It doesn’t matter whether you are a journalist, designer, sysadmin or gardener; All that matters is that you are willing and have a laptop to spend the afternoon hacking on “Open Data”.

Hence, the OD4D will monitor the Web.br and bring you all the relevant discussions on Open Data, as well as cases presented and, of course, whatever is created during the Decoders.
Do not miss it!

Follow the event live and our coverage on: , SoundCloud and .

Developing Latin America

Desarollando America Latina (Developing Latin America) is an event aimed at fostering applications development, which takes place simultaneously in eight Latin American countries: Argentina, Brazil, Bolivia, Chile, Costa Rica, Mexico, Peru and Uruguay. Its goal is to gather web developers, webmasters, web designers, journalists, among other professionals in a new applications contest, based on reusing open data.

It is renowned as the biggest collaborative hackathon of the region and is currently in its second edition. In 2011, the applications created using open data were related to issues such as health, education and security. Among about 50 applications created, Onde Acontece? won first prize. Its idea was to cross various data and provide information on public safety.

Brazilian Open Data Portal

A data repository, the dados.gov.br portal aggregates 82 pubic datasets formerly scattered across the Internet. Launched by the Ministry of Planning, the project design also had extensive contributions from society. Moreover, the website also enables people to suggest new data for opening, to participate in Open Data events and to keep up-to-date with the portal’s development initiatives.

Users can also check out a few applications developed by communities using data available through the portal. One of the applications is the so-called “Basômetro“, a tool that enables measuring parliamentary support to the government and monitoring members of parliament’s stances on legislation votes.
Another application available on the website pinpoints the work accidents between 2002 and 2009 in the map of Brazil. Users are able to view accidents by municipality and by type.

The dados.gov is part of the National Infrastructure of Open Data (INDA), which is a project aimed at setting forth technical standards for Open Data, promoting qualification and sharing public information using open formats and free software.

W3C Brazil launches Open Data portal in Latin America

Stemming from a project created in partnership with the ECLA, the website will be launched this Wednesday in Ecuador.

To contribute to the development of Open Data strategies leading to accountability, innovative services and effective public policies, thus promoting a more inclusive economy of knowledge in Latin America and the Caribbean: This is the objective of the Open Data for the Development of Public Policies in Latin America and the Caribbean (OD4D) project, implemented in partnership with the International Development and Research Center (IDRC) of Canada, W3C Brazil and the Economic Commission for Latin America (ECLA).

The OD4D portal was created as a means of providing constant updates on the project and on the progress of the global debate on Open Data. It will be launched this Wednesday, October 10, in Quito, the capital city of Ecuador during a preparatory meeting for the IV Ministerial Conference on the Information Society in Latin America and the Caribbean (eLac), promoted by the ECLA.

In order to contribute to the body of knowledge on Open Data and its potential to improve the quality of public policies in the region, the OD4D website compiles articles, documents, videos and several data on the topic. In addition to the content produced through the project (manuals, guides, scientific articles, lectures, seminars, workshops), the website foresees collaboration from society to add to its content.

According to Vagner Diniz, manager of the W3C Brazil, which will be launching the portal in Quito, the trilingual (Portuguese, Spanish and English) channel focuses on research on the impact of the use of open data on public policy-making and local economic development: “The idea is to promote debates, as well as to produce and share materials on the topic. The portal will share workbooks, manuals, reference to several portals worldwide. It will work as an aggregator, i.e. a repository of information.”

Vagner also reminds that, although the whole project is being developed in partnership with the ECLA, content management and production for the portal will be the sole responsibility of the W3C Brazil, which is already renowned for reference publication on Open Data.

About the W3C Brazil office – W3C.br
In line with the CGI.br’s deliberations and the requirements set forth by the W3C (World Wide Web Consortium), the NIC.br launched the W3C office in Brazil – i.e. the first in South America. The W3C is an international consortium aimed at promoting the realization of the Web’s full potential, by creating standards and guidelines to ensure its constant development. Over 80 standards have already been published, among which there are the HTML, the XML, the XHTML and the CSS. The W3C in Brazil supports global goals for a Web for all, from any device, based on knowledge, security and responsibility. More information available here.

About the Brazilian Network Information Center – NIC.br
The Brazilian Network Information Center (nic.br) is a civil, non-profit entity that implements the decisions and projects of the Brazilian Internet Steering Committee. The NIC.br is permanently in charge of coordinating the domain name registry – Registro.br, of studying, answering and dealing with security incidents in Brazil – CERT.br, of studying and researching network and operation technologies – CEPTRO.br; of producing indicators on information and communication technologies – CETIC.br; and, of hosting the W3C office in Brazil.

About the Brazilian Internet Steering Committee – CGI.br
The Brazilian Internet Steering Committee coordinates and integrated all Internet service initiatives in the country, promoting technical quality, innovation and awareness of the services on offer. Based on multi-lateral, transparent and democratic principles, the CGI.br represents a multi-sector model of Internet governance, effectively involving all sectors of society in its decision-making processes. One of its publications is the so-called “10 Principles of Internet Governance and Use“. More information available here.

Press Contacts:

http://www..com.brTwitter / Facebook / Youtube:
Switchboard: /
Daniela Marques – HYPERLINK “mailto:”

Vanessa Morais –

Everton Schultz –
Press Relations – NIC.br
Caroline D’Avo – Press Relations Officer –
Everton Teles Rodrigues – Communications Assistant –
Flickr

“Most of the data stored by governments is not translated into information or services to the population”

Interview originally published in Blog Públicos – Estado de São Paulo

“Governments are not really aware of the amount and nature of the data they have stored. When they do have a rough idea, they lack the time to consider how that data can be applied and converted into services for the population.”

The general manager of the W3C consortium in Brazil, an international community of 300 private and state enterprises and universities that work together to develop Web standards, Vagner Diniz maintains in his interview to Públicos that governments must allow civil society to decide which public data are of interest to the population. He also believed that both parties must join forces to make the data supply meet the demand for information.

“We cannot just sit around waiting for the government to publish information, wasting money on data that might not even be of interest to the population. We will try to identify which data can be actually useful, create a demand for it and reach an agreement with government bodies to come up with a framework of priorities,” he says.
According to Diniz, civil society can spot possibilities in the data that are overlooked by governments. “Two hundred million people will see much more than 4 or 5 million civil servants.”

Why is it important for governments to publish their data in open formats?
The amount of data gathered and not used by governments ends up creating a useless mass of information. Governments use only the portion of the data that they need for administrative purposes. Most of them are not translated into information or services for the population. Governments are not really aware of the amount and nature of the data they have stored. When they do have a rough idea, they lack the time to consider how that data can be applied and converted into services to the population.

How important is this information to civil society?

What’s most important in making this information available is allowing the population itself to say: “This set of data might interest me, it is useful to me. Let me use it because I’ll be able to come up with scenarios in which it is relevant, while you as government have too many other concerns that prevent you from seeing what I can see.” In other words, it’s the idea that two hundred million people will see much more than 4 or 5 million civil servants. With governments worldwide starting to open their data, organizations, communities, interested individuals, Web programmers and volunteers have created interesting application software to make use of the data available.

What about to governments?
Curiously, this has generated an exchange of data within governments themselves. Different government bodies now have access to information from other bodies, which was previously very difficult to obtain due to endless bureaucratic processes.

This will undoubtedly contribute to greater government efficiency. But how can we guarantee that the immense supply of data stored by governments will meet society’s demand for information?
That is a tough task which I do not expect to see easily accomplished. Reaching an ideal stage of free-flowing information from government to society will be a hard process. It will involve raising awareness. There is a lot of resistance to publishing public data because the government sees itself much more as a proprietor than a custodian of that data. Public data are public, they belong to the population, and governments are custodians of data, but they act like proprietors. They fear what will be done to “their” data. A second effort involves qualification, as publishing these data in open formats demands a certain degree of technical expertise. We have to study the technologies that allow data to be openly published on the Internet. We must train people to do this.

Now…
…lastly, there must be an open and frank dialogue between the custodians of the data, the government bodies, and those interested in having access to the data, civil society organizations and many private citizens. We will try to address priorities. We cannot just sit around waiting for the government to publish information, wasting money on data that might not even be of interest to the population. We will try to identify which data can be actually useful, create a demand for it and reach an agreement with government bodies to come up with a framework of priorities.”

You once mentioned that developing application software is much easier than gathering consistent data. Could you explain this?
Developing an application based on data available merely involves creating a code which any slightly experienced web developer can read and freely apply to his own application. It is quite simple, much like creating a Web page. You don’t even have to be a Web developer to create a Web page nowadays, thanks to the tools available. Publishing data in an open format is more complicated, given that you, as the custodian of that data, have many other concerns besides the technical aspect of making the data available. It’s about more than that…

Yes…
…you have to make sure that the data is consistent. There cannot be another dataset with information that clashes with the data being published. You will publish three, four, ten databases, and any similar information they contain cannot be inconsistent. Secondly, there are security issues you need to worry about. You cannot allow the person who will use the data to alter them in any way. Thirdly, the data being published must be certified. Because if someone happens to misuse these data and alter them in any way, and then claim to have obtained the information from a government website, you, as the publisher, can prove that the original data were altered by that person. So there are many aspects to be considered when making information available.

Can you give an interesting example of data inconsistency?
I had an experience as IT director of a city in the state of São Paulo. A typical case was the city’s streets register. Each city hall department had its own register, with data boxes tailored to the needs of each department. The finance department’s register was geared towards collecting property tax, while the register of the public roads department focused on road works. The legal department was more focused on executing outstanding debts, and so forth. I counted six or seven registers. All of them had different information about the same streets. Even worse, the street names also differed among the registers, with different abbreviations. You never knew if a street in one register was the same as in another. It was also impossible to unify these registers, as they had different formats. This poses a serious problem when the information is made available, as different registers show the same information in different ways.

This reveals not only the size of the problem, but also the growing need to standardize government information.
Absolutely. This has been critical since the adoption of information technology in the organization of corporations. The need for standardization goes way back. Professionals in the area joke that the purpose of information technology is not to help you get better organized, but to help you make the same blunders you used to do without it (laughs). When you computerize an environment without altering processes and standardizing information, you will just do the same things you did before, but more quickly.


Can the private sector benefit from open data? If so, how?

I believe so, although the private sector has not yet realized this. It can benefit greatly in many areas of the open data value chain, especially technology businesses. One example is publishing open data on the Web. Moreover, creative and innovative businesses will scrutinize the open data carefully and be able to find ways to reuse and transform these data into commercially valuable services.

Can you give an example?
Nowadays, the IBGE Census is a rich source of information. It contains a lot of data on the country, the citizens, their distribution and characteristics. If these data are made available they can be extremely useful, albeit ensuring the right to confidentiality of personal data. Based on them you could, for example, offer consultancy services for new businesses, basing it on socioeconomic profiles; and you could also give advice on which businesses are in demand based on household profiles. There is another example in operation in Brazil called Gas Finder, an application for mobile phones which allows users to locate nearby gas stations. It is extremely useful and was developed using data available on the website of the National Oil Agency. You don’t necessarily have to generate income by charging the customer directly; income may be generated from ads displayed with the information. All it takes is entrepreneurship and creativity.

Cases: Apps for Democracy

The idea was born in 2008, due to DC’s government willing to ensure that both society, governments and businesses could make good use of DC.gov’s Data Catalog (that provides, for example, public information on poverty and crime indicators, in an open format).

Therefore, a competition was created to award the best applications developed based on data from the Catalog. The first contest cost Washington DC U$50,000 and produced 47 iPhone, Facebook and web applications with an estimated value in excess of U$2,600,000 to the city.

The application iLive.at won a gold medal for providing crime, safety and demographic information for those looking for a place in DC.

Another award-winning project was Park It DC, which allows users to check a specific area in the district for parking information.

Learn more about the project and check out the video at:

Paper by Felipe Heusser. Participate!

Among the expected results of OD4D, we can include the preparation of documents that will serve as theoretical and methodological basis for the project. Those documentation, therefore, lie on producing further knowledge on Open Data and its potential to improve the quality of public policies in Latin America and the Caribbean

The idea is that the writings take into account the international literature that evaluates the institutional
context and technological conditions required for open data initiatives.

The first documentation is being held by Felipe Heusser, founder and director of Fundación Ciudadano Inteligente, a Latin American NGO based in Chile that uses information technology to promote transparency and active citizen participation. The paper Understanding Open Government Data is under construction and you can participate by making comments on the draft and thus assisting in its preparation.

You can also download the file here to read and use the PAD for comments.