Data

311 Datasets

311 service is generally implemented at the local level, and in some cities, it is also used for various municipal calls. The service in chicago uses to collect a bunch of data and release them live to the web for free. Live means that the dataset is updated with a certain frequency that goes from minutes two variuos days, and they can be easily accessed and queried real time with the API that they provide. The main page where to find data can be found starting from here and the SODA2 API webpage used to query the datasets is available here.

Potholes Data

The 311 service keeps a dataset of potholes in Chicago at this page, that can be easily queried by downloading the linked JSON file and appending the parameters in the URL. The parameters we specified have been chosen as explained below:

$select = only the fields we needed to display
$where = latitude and longitude are not NULL (to display the pothole), the pothole hasn't been fixed yet, and it has been pointed out in the last month
$order = order by date to have sorted set to manipulate

The data is retrieved when the user activates the layer and is asynchronously downloaded, parsed and displayed. The frequency of updates is 1 minute, so that the user can possibly see live updates if new data is available.

Street Lights Out Data

The 311 service keeps a dataset of broken street lights in Chicago at this page, where are listed all the poles with 3 or more lights out, and this page where are listed the poles with 1 or 2 lights out. Data can be easily queried by downloading the linked JSON files and appending the parameters in the URL. The parameters we specified have been chosen as explained below:

$select = only the fields we needed to display
$where = latitude and longitude are not NULL (to display the lights on the map) and the broken lights has been pointed out in the last month but still hasn't been given back
$order = order by date to have sorted set to manipulate

Abandoned Vehicles Data

The 311 service keeps a dataset of abandoned vehicles in Chicago at this page, that can be easily queried by downloading the linked JSON file and appending the parameters in the URL. The parameters we specified have been chosen as explained below:

$select = only the fields we needed to display
$where = latitude and longitude are not NULL (to display the vehicle on the map) and the vehicle has been pointed out in the last month but still hasn't been given back
$order = order by date to have sorted set to manipulate

Crimes Data

The 311 service keeps a dataset of crimes in Chicago at this page, that can be easily queried by downloading the linked JSON file and appending the parameters in the URL. The parameters we specified have been chosen as explained below:

$select = only the fields we needed to display
$where = latitude and longitude are not NULL (to display the vehicle on the map) and the crime has been pointed out in the last month
$order = order by date to have sorted set to manipulate

The data is retrieved when the user activates the layer and is asynchronously downloaded, parsed and displayed. With this particular dataset, the handling of the results is more tricky since the data retrieved is significantly bigger (10 times bigger than the data described above), so the asynchronous download and the cache of the data has been fundamental to avoid the application freeze and increase the usability. Also, we decided to classify the crimes into 3 distinct categories that have been assigned to each single crime starting from it's type. For example:

Type = Theft -> Category = Property Crime
Type = Burglar -> Category = Property Crime
Type = Homicide -> Category = Violent Crime
Type = Battery -> Category = Violent Crime
Type = Prostitution -> Category = Quality of Life Crime
Type = Gambling -> Category = Quality of Life Crime

and so on ... In this way turns really meaningful make comparison and statistics on the different kind of crimes.

Divvy Bikes Data

Divvy is a bicycle sharing system that launched in Chicago on June 28, 2013 initially with 750 bikes at 75 stations spanning from the Loop north to Berwyn Ave, west to Kedzie Ave, and south to 59th St. The system was planned to grow to 4,000 bicycles at 400 stations by Spring 2014, however supply shortages have delayed expansion to 2015.

Bikes Data

The company released a bunch of historical data on their launch, and now keeps updating this JSON file every minute with the information of all the divvy bikes stations in Chicago. The application checks if new data is available every 10 seconds, so that the user is actually aware of the stations status in real time. The only issue that we encountered with the download of this file was how bypass of the Cross Site Domain error that was given from our server, and we managed it using this proxy server owned by Google.

Other Resources

We felt like the application could show much more than that, and we decided to extend it by getting other data from external resources to give more useful real time information. In general to get this data we decided to use a php script to authenticate i a easier way with the server, and pass trough the Google proxy mentioned before.

Yelp Data

Yelp is a multinational corporation headquartered in San Francisco, California and provides basic data about businesses such as hours of operation. We have used their APIs to retrieve data about the best classified restaurants in Chicago. Unfortunately the sorting by rate only allows to retrieve 40 results, which were not enough for what we wanted to visualize. So we have decided to retrieve them with the 'best match' criteria, which gives back up to 1000 preferred restaurants sorted with a complex algorithm that takes into account many more factors than just rating. This results have been combined with the data that can found here to highlight the preferred restaurant that have failed a food inspection in the last few months.

Twitter Data

Twitter is a really powerful instrument nowadays, that can give tons of useful information if they area well filtered. The Twitter APIs are really straightforward and easy to use; the biggest challenge has been to find some interesting tweets regarding the purpose of the application: security on the street. We noticed that important tweets regarding crimes and inconviniences come from settle down companies like #BreakingNews that usually don't share their location, while everyday users that are geolocalized cannot be trusted apriori regarding this kind of alerts. Eventually we decided not to the meaningless geotagged tweets on the map, but rather untagged meaningful tweets as simple notifications.

Wikipedia Data

We used wikipedia just as external resource to retrive information about tourist attractions and important places. No API has been used, the information have been taken directly from the Wikipedia web pages.

CTA APIs

Data about bus and trains are retrieved taking advantage of the CTA APIs.
CTA provide two handy manuals for its APIs:

CTA Bus

For Buses the “Bus Tracker API documentation” has been used. Since it provided not so user-friendly “calls” for the purposes of this application, a good design about queries and their optimization has been needed.

Calls Hierarchy

The main API calls are:

getroutes
which returns the list of all the cta routes
getroutedirections?Rt=XXX
which returns all the directions given a route
getstops?rt=XXX&dir=YYY
which returns all the stops given a route and a direction
getvehicles?Rt=XXX
which returns all the vehicles given a route
getpatterns?rt=XXX
which returns a list of the geographic points that compose the route
getpredictions?stpid=XXX
which returns a list of time predictions for a given stop

As it is possible to notice, having a list of vehicles or stops for a particular area is not straightforward. Retrieving the whole list of stops would be a process which involves too many calls. For this reason, it has been decided to save a local copy of the bus stops which is updated everyday. This list of stops is retrieved according to the following scheme:

getroutes
for all routes -> getroutedirections
for all routes and directions -> getstops

To this list of stops, the geographic selection filter is applied in order to reduce the size of it.

Once a filtered list is obtained, it is scanned to find all the routes of the selected area by simply checking the route of each stop.
With the list of all the active routes of an area it is now possible to call the getvehicles API to retrieve a list of vehicles. This list too is filtered to have just the vehicles currently inside the selected area.
The list of the active routes is also used to call the getpatterns api, which returns all the paths of those routes.

After a click on the popup of a station a getprediction call is made with the id of the selected station.

Bus Animation

Data about vehicles is updated on the CTA servers just once every minute. This is such a long time for information like position of buses and it is difficult to provide the user useful information. With this kind of update rate it is even difficult to understand which direction they are moving to.
For these reasons it has been decided to simulate the positions of buses in between two consecutive updates. This is pursued through some steps.

Mobile mean adjusted speed

Once an update arrives, data about the previous update is stored in order to compute the speed of the last "step". The speed of the bus is set as a mean between an empirical value (that has been found analyzing the buses) and the mobile mean of the latest 5 speed values of the bus.

Route Repositioning

Position data of the buses is not so accurate and in some particular conditions like, for instance, in downtown among the skyscrapers, the signal is very noisy. For this reason each bus is repositioned on his track just after an update.
This is pursued quite easily: since every bus "is aware" of the route it is traveling on, it is possible to set the position of the bus on the route in a precise way. This task is accomplished by computing the distance between the bus position and each segment which the route path is composed of. The segment with the smaller distance is chosen and the bus is positioned in between it by simply following a perpendicular line toward it.

Animation

Once all this data is available, the bus is animated with a frame-rate of 10fps according to the simulation that has been conducted so far.

New data update

When a new update is retrieved, the bus position is reset to the actual position contained in the data and simulation starts again.

CTA Train

For Trains the “cta Train Tracker API documentation” has been used. These APIs are currently in beta phase so they still contain some imprecisions.

They have been used just for retrieving the predictions of trains approaching each station. This task is accomplished making use of the call ttarrivals?stpid=XXX which returns the arrival time related to a given stop
The list of the stations, instead, is provided by CTA as a static csv file which has been parsed.