Open Data for Development in Latin America and the Caribbean » Data Catalog

Get involved

Data Catalog

Management of Open-Data Cycle

The cycle involving open data, whether governmental or not, is large and needs to be observed to ensure that publishing initiatives and open data are sustainable. In other words, for the benefits promised by scholars who promote the open data philosophy to become something concrete to humanity, it is necessary to observe all points involving the publication of open data on the Web.

Below is a chart to illustrate better the relationships between the main stages involved in data publishing.

cadeiaDadosAbertosGerenciamentoCicloEn

(this graphic is published in an open vector format – svg – on GitHub)

According to the graph, the cycle that given data follows may begin from the COLLECTION stage.
An information society age with hardware devices available for relatively affordable prices offers an abundance of possibilities for information sources to be creative when accumulating data that may serve to benefit people in the near future.
Collection Situations
Spontaneously provided forms, sensors, and data and others may be used to establish interesting bases. It is important to note that monitoring should not rely on people, but on the environment in which they live. If any information depends on a human to be provided, it should always be protected by privacy laws, and contributing should be unenforced and free, with no obligations. The advantage of designing data collection in a given situation is the possibility of providing facilitation tools for in all other phases. This means that if the collection solution is planned, it can contain important elements, such as semantic web resources and accessibility that can save labor (particularly in use and reuse phases) and innovatively qualify the database in question.

A good example of this is the base used for the “Open Self Medication” application, which combines semantic databases on drugs, their compositional substances, and symptoms for each disease, relating each case in different ways that can, for example, decrease the error at the moment of prescribing a drug.

Extraction Situations
Extraction situations must be performed in closed-data publishing. Almost always these are closed and proprietary or unstructured formats with fields that are either missing or containing erroneous or useless information. It is no wonder that the process for making such data usable and analyzing it is called mining. Likewise, the process of extracting public data from closed or limited-access databases via coding activities is known not only as hacking but also as web scrapping.

Several methods and tools exist for extraction. The DadosGovBr GitHub contains a repository of very useful tools that documents some related tools.

Storage/Publication/Distribution of Open Data
Once one has the data in hand, if the intention is to publish or republish, they must be stored in repositories structured and designed to receive and distribute such data in an open and interoperable manner. These are the catalogs.
Forming consistent catalogs requires some basic rules that were initially defined to catalog library content. With the advent of the information age, they were transposed and adapted to the data catalog formation context. Users can utilize catalogs to refine searches or help in interpreting entries. According to the Brazilian Internet Steering Committee:

The semantics of information must be agreed in advance, so that all parties have a common understanding of the meaning of the data exchanged. At the international level, this may be a complex issue, since certain legal concepts differ from one country to another. The ultimate goal is to be able to interpret data evenly between the different platforms and organizations involved in data exchange. To do this, it would be useful to publish on the Web the names and definitions of the elements used in a shareable and referenceable format, regardless of the degree of support obtained.

Semantic qualification of data can add much to the chosen bank. Thesauruses, taxonomies, vocabularies, classification schemes, among others, are resources used to produce 5 star data. Selecting or constructing these tools requires fully understanding some of the Web standards and tools for data catalog storage and publishing.
One of the most frequently used tools for viewing catalogs is CKAN, which is also a publishing, storage and management tool for data sets. CKAN is free software, developed and maintained by a community, which means it has no cost and a very positive learning curve. Dados.gov.br currently uses this software to maintain the Brazilian government’s open-data portal
Regardless of the software suite adopted for storage and publishing, it is important to include in the open-data cycle planning certain concept standards. They are (taken from here)

  • URI: a resource identifier used to identify or locate something on the Web
    • A URL is a URI that identifies a resource and provides a means to act on it, obtain and/or represent this resource, describing its primary access mechanism or location on the “web”. A URL is a URI that identifies a resource and provides a means to act on it, obtain and/or represent this resource, describing its primary access mechanism or location on the “web”.
      For example, the URL http://www.w3c.br/ is a URI that identifies a resource (the W3c Brasil website), represents this resource (the HTML page, for example), and is available via HTTP from a network host (http://www.w3c.br).
    • Below is a diagram showing the structure of a URI (Taken from this site).

uri schema

 

 

  • RDF/XML: XML is a W3C standard format for creating documents with data organized in a hierarchical fashion, as often seen in formatted text documents, vector images, or databases.
  • SPARQL: “sparkle”, also recommended by the W3C and administered by the W3C Semantic Web Groups, is used to search for information independent of the format of the results. One can also use SPARQL to work with data in RDF.

There are standards for publishing data in open format. To provide an interoperable environment in all e-gov domains, it is imperative that laws and/or governmental recommendations specify and regulate these standards. When data is dynamic, interoperable, and fed systemically, costs are reduced and processes are incorporated in administration routines (such as completing tables in Excel). In practice, this means that planning may be slow and expensive, but it also reduces the cost of maintaining a sustainable environment.
The Importance of APIs
When it comes to large volumes of dynamic and open data, the best way to plan to open one’s data is to include a conversation on APIs and seriously consider using them. APIs are intended to be used as an interface by software components to communicate with each other.
API stands for Application Programming Interface. An API is a set of predetermined programming rules that enables creating applications that use these rules to obtain data in layers that do not appear to the average user. They connect and continue “working”, interoperating multiple systems and applications when data is requested. APIs should be open and transparent so that developers may access them and suggest new features to improve their applications.
The presentation that accompanies this part of the course is available for download here and available to read online.
Use / reuse
Continued in the next post, on visualizations and applications.

Open Data is Hot Topic at the W3C Brazil Conference

The city of São Paulo hosted on October 18-20 the 4th Web.br Conferece – an event promoted by the W3C Brazil office to debate the future of the Web – and Open Data was one of the hot topics debated.

According to the manager of W3C Brazil, Vagner Diniz, debating Data opening is paramount for the Web’s progress. Hence, the topic was part of several programming activities, such as panels, lectures, coffee break chats, as well as during the hackathon:

“There is an ever increasing number of devices capable of connecting to the Internet. The connection of several different types of devices to the Internet only makes sense if such devices can communicate with each other – i.e. if they can exchange information between them -, so that this data sharing enables better use of each of the devices connected. When we talk about an open Web, which was the theme of this Conference, we are talking about a Web that comprises these devices connected. And by talking about a Web that comprises these devices connected, we are referring to Open Data. For it is paramount to have data capable of trafficking from one place to the other seamlessly or data that enables me from my device to access data in a different device, thus enriching my Web experience”.

With a full room at 9 a.m., Jeanne Holm’s (Data.Gov evangelizer and Chief Systems Architect at NASA’s Jet Propulsion Laboratory) talk presented the U.S. experience opening its data and its impact on citizens’ lives. In an exclusive interview, Jeanne Holm said that the U.S. Government’s focus in regards to this topic is on how to provide more data, information and services to citizens, so as to enable them to make better decisions daily.

According to her, the government’s Open Data initiative involves 180 agencies, which have already provided access to 400 thousand databases.

“What is interesting about this is that when developers come together, such as in an event like this Conference today, they get their hands on these data and create applications or websites, or data journalists analyze them and help understanding what those data mean.”

Here you can watch the whole interview:

Another highlight of the Conference was an announcement by the Ministry of Justice confirming its first publication of data on the website dados.gov.br. According to Francisco Carvalheira, Coordinator of the Ministry of Justice’s Transparency and Access to Information Program, the institution decided to open its database of customer complaints received through Procons (Customer Protection Agencies) across the country. A “substantiated complaint” is an administrative procedure foreseen by the Customer Protection Code that represents 15% of the complaints registered by Procons.

“We believe that society will be able to come up with potential uses for this database. We believe that by publishing this database in open format we’ll be contributing to the actual Customer Protection public policy.”

The announcement was made by the Ministry of Justice during the Panel: “How to make the most of the Access to Information Act“. During this presentation, Francisco Carvalheira told that the institution has so far received 2,047 requests to access the information.

Also, in order to ensure the practical aspect of the debates, the Web.br created a space for journalists, programmers and webdesigners to work together in existing databases to produce information. During the Decoders hackathon, Open Data cases were presented and application templates were created using public data bases.

Zeno Rocha is a developer and he tells that him and a friend created a game specially to be presented at Decoders and motivate participating developers. According to him,  is an application aimed at providing young Facebook users information on politicians in a fun way.

The developers Kako and Rafael, on the other hand, saw their “Transpolitica” project win the Hackthon. This was the first time they both worked with Open Data.

Web.br Conference debate on Open Data

The 4th Web.br Conference began this Thursday, in São Paulo. The aim of the event is to uncover new paths and steer the debate on the future of the Web; highlighting its most relevant topics and discussing how to universalize it according to W3C’s principles: Web for all, from any device, anywhere and in any language or culture.

Brazilian and foreign experts will talk about HTML and CSS3, Web Accessibility for disable people, Semantic Web and Data Viewing. During the three-day Conference, the Open Data movement will also be discussed, as well as its impacts on the Access to Information Act.

On Saturday, the Conference will give way to the Decoders W3C Brasil, a collaborative event with hackers (hackathon). “It doesn’t matter whether you are a journalist, designer, sysadmin or gardener; All that matters is that you are willing and have a laptop to spend the afternoon hacking on “Open Data”.

Hence, the OD4D will monitor the Web.br and bring you all the relevant discussions on Open Data, as well as cases presented and, of course, whatever is created during the Decoders.
Do not miss it!

Follow the event live and our coverage on: , SoundCloud and .

Open 311

3-1-1 is a number well known in some cities of the United States and Canada, where citizens can notify the authorities about situations that are not urgent like non working traffic lights, illegal burning, roadway problems, etc. The goal is to leave the number 9-1-1 for those emergencies that really need immediate attention.

Open 311 “provides open channels of communication for issues that concern public space and public services. Using a mobile device or a computer, someone can enter information (ideally with a photo) about a problem at a given location. This report is then routed to the relevant authority to address the problem. What’s different from a traditional 311 report is that this information is available for anyone to see and it allows anyone to contribute more information. By enabling collaboration on these issues, the open model makes it easier to collect and organize more information about important problems. By making the information public, it provides transparency and accountability for those responsible for the problem. Transparency also ensures that everyone’s voice is heard and in-turn encourages more participation”.

Learn more

 

Brazilian Open Data Portal

A data repository, the dados.gov.br portal aggregates 82 pubic datasets formerly scattered across the Internet. Launched by the Ministry of Planning, the project design also had extensive contributions from society. Moreover, the website also enables people to suggest new data for opening, to participate in Open Data events and to keep up-to-date with the portal’s development initiatives.

Users can also check out a few applications developed by communities using data available through the portal. One of the applications is the so-called “Basômetro“, a tool that enables measuring parliamentary support to the government and monitoring members of parliament’s stances on legislation votes.
Another application available on the website pinpoints the work accidents between 2002 and 2009 in the map of Brazil. Users are able to view accidents by municipality and by type.

The dados.gov is part of the National Infrastructure of Open Data (INDA), which is a project aimed at setting forth technical standards for Open Data, promoting qualification and sharing public information using open formats and free software.

Cases: Apps for Democracy

The idea was born in 2008, due to DC’s government willing to ensure that both society, governments and businesses could make good use of DC.gov’s Data Catalog (that provides, for example, public information on poverty and crime indicators, in an open format).

Therefore, a competition was created to award the best applications developed based on data from the Catalog. The first contest cost Washington DC U$50,000 and produced 47 iPhone, Facebook and web applications with an estimated value in excess of U$2,600,000 to the city.

The application iLive.at won a gold medal for providing crime, safety and demographic information for those looking for a place in DC.

Another award-winning project was Park It DC, which allows users to check a specific area in the district for parking information.

Learn more about the project and check out the video at: