Welcome to the extra info page. Here you'll find some high level information on App Development, Hackthon Tips,
SQL, PYTHON, R and Data Visualisation that we think may be of interest.
You can use any tools, languages or systems that you like however during the challenges, these are just some suggestions! If you have any queries or questions just give us a shout.
(Note that AIB is not responsible for the content or actions of any external websites who are linked to from this site.)
Never built an app before? Don't worry, it's not as difficult as you may think and if you can code
already, you're halfway there. Here's some pointers to get you started:
1. App development is free. Yes you need a computer and yes you need to pay Apple ($99) or Google ($25) if you want to put your app into their App Stores for others to download — but you can still build an app and put it on your phone or friends’ phones for free.
2. To make Apple (iOS) apps you use a software package called Xcode. It’s free to download from the App Store on your Mac (you can only build iOS apps on a Mac, not a Windows machine). For Android apps, you use Android Studio — again a free download, from Google. You can use Android Studio on a Mac or Windows machine.
3. Xcode and Android Studio are a bit like Microsoft Work or Excel — they’re software applications designed to help you do something. Excel helps you make spreadsheets, Xcode helps you make apps. Both are designed to be easy to use with loads of useful tools and features.
4. You can learn to code for free. There are plenty of fantastic sites and resources that you can buy / subscribe to, but there are just as many free ones. All you really need is time — time to learn coding, how it all works, to start off with the basics, to practice and do exercises and gradually build up your skills and experience.
5. “Coding” apps basically means using a programming language to specify what the app does. iOS apps use a language called Swift, Android use one called Java. There are lots of alternative platforms to make apps (Xamarin, React Native, PhoneGap, Titanium etc) but for beginners I’d suggest learning the base “native” ways — Swift or Java.
So where do you start? If you have a Mac and an iPhone, I’d recommend learning how to make iOS apps using the Swift programming language, with Xcode. From my experience teaching both iOS and Android to students with no previous coding, iOS is easier to learn.
If you only have a Windows machine / only an Android phone, learn how to make Android apps using the Java language with Android Studio.
Where can you start to learn? RayWenderlich.com is one of the best sites around, with lots of free tutorials, along with ones you can buy aswell. Google’s official Android tutorials are great aswell for beginners. A small warning - both Xcode and Android Studio are quite large so careful where you download them!
We'll have our engineers on hand all day at the DataHack to help out, they don't get out much so are quite looking forward to it. I'll also be available if anyone wants to try beat me at Street Fighter 2 on our Sega Megadrive. #notgoingtohappen #hadouken
I've participated and judged in loads of Hackathons, big and small, here's some hopefully useful tips I've learned along the way.
1. It's about having fun and learning, not just winning. If you win, brilliant, but I love hackathons as they're a chance to build something new and learn new technologies or skills.
2. The best ideas are usually about solving problems or meeting needs. You can build the most beautiful and amazing app ever seen, but if users can't actually see a reason to use it, there's no point. Unless it's a pure entertainment app, which is itself a need.
3. Keep focused on your main objective. It's easy to get side-tracked, go off on tangents, or spend too long on individual features - and you may end up with an unfinished product. Remember what the goal is and use that as your compass.
4. Leave yourself time to make it look good and easy to use. Not everyone is a designer but it's worth spending time trying to make your solution beautiful and have a great user experience. I've seen plenty of great ideas and well coded apps let themselves down at demo time because of bad UI/UX.
5. Also leave youself time to practice your pitch. You'll only have a few minutes to tell the judges all about your idea, the problem it solves etc so make sure when you talk to them that it's not the first time you're saying it!
6. Talk to the other teams. While it's a competition (with some awesome prizes!), one of the best things about hackathons is meeting people and learning about more than just what you're doing. It's highly unlikely someone's going to steal your idea and you'll gain more by networking and chatting than you will by keeping to yourself.
7. Ask if you need help. We're going to have engineers there all day to help out, so don't be afraid to ask us anything. We want students to learn from the DataHack so if you want to ask about a coding issue, or your app's design, the best way to pitch it or anything else, we'll be more than happy to help.
8. Seriously, have fun!
(by Andy O'Sullivan, Chief Thought Architect)
SQL (Structured Query Language) is a database computer language designed for the retrieval and management of data in a relational database.
Let’s face it - if one does not have SQL skills these days, getting a job in a data team would be a daunting challenge.
And why is that you may wonder? Relational databases have changed the way we think about data, how we store it and how we retrieve it.
It happened a long time ago but to date RDBMS (Relational Database Management Systems) are the most popular storage solutions
in pretty much every company on earth. Yes there is NoSQL and
yes there is Hadoop but you would be surprised to know how many are still doing it the old school way.
So what is it that is so appealing about SQL? I would say the first thing is its relatively easy learning curve. Most engineers go from no SQL skills to proficient in a short amount of time. If you are a novice in this language, imagine the way you would ask a table in a database to give you data, something like select these things from that table. And that is almost exactly how you would code it in SQL.
The most important thing to know in SQL is 3 main keywords:
SELECT, FROM and WHERE
Every SQL Query you write will have at least the first 2. Show everything in the table with * :
SELECT * FROM table_name WHERE [condition];
Or select specific columns:
SELECT ID, NAME, AGE FROM CUSTOMERS WHERE SALARY > 2000;
Aggregate on columns with functions like SUM, COUNT or AVG. The % sign in the where clause can stand for zero, one or multiple characters:
SELECT COUNT(ID) FROM CUSTOMERS WHERE NAME LIKE '%Cottica'
Another popular aspect of the SQL language is the fact that across a multitude of different RDBMS technologies, SQL maintains querying standards. In that way moving from a platform to another is not that difficult bar some subtle differences that are proprietary to each technology.
If you are curious about RDBMS flavours out there, let me throw a few names at you: SQL server, Oracle, Teradata, Vertica, HBase, Hive … they all share SQL coding standards. Furthermore you will find that when you develop an application in C, Java, R or PHP, if you interface at any level with a database, you will have to code your queries using SQL.
Below you can find some links to get you started in understanding the background and life cycle of the most popular querying language on the planet:
(by Max Cottica, Head of Data Science and Big Data Solutions)
Python is a very high level, dynamic programming language, emphasizing code readability and ease of
use. It allows you to program in multiple different styles depending on your preferences, but tries
to stick to the principle that there should be at most one good way to do any particular thing.
As a result of these features, Python has been widely adopted in the data science community. Some really fantastic data science libraries have been written for Python - Anaconda Python distribution has all of the most common data science libraries packaged, so you'll rarely find yourself lacking the appropriate functionality! I recommend looking at Pandas in particular.
Having said that, Python is more than capable of doing some pretty impressive data analysis in a very small amount of code using only the core libraries. We'll try creating a program that counts the frequencies of each unique word in a file to showcase this. Note that anything after a '#' is a comment and will be ignored when running the program.
Save this code as HelloDataScience.py (or something else like a.py if you hate typing), then in the command prompt run:
python HelloDataScience.py | more
The `more` bit ensures the output doesn't flood the terminal screen immediately.
import re # For regular expressions
from collections import Counter # For counting hashable types (like strings)
Open the file for reading and call it 'f'. The file will only be open in this `with` block, and will be automatically closed at the end of the block, so you don't have to worry about closing it manually later.
with open('path/to/file') as f:
This regex, when used, will match all characters which are not alphanumeric except for the single-quote character. Remember, regular expressions are your friend, especially when dealing with messy data! Make sure you know how to use them effectively.
regex = re.compile(r"[^\w']")
Read the file as a string, split by whitespace (including tabs and newlines), then for each word in that list, strip unwanted characters using the regex above.
words = [regex.sub("", word) for word in f.read().split()]
Count each value in the list
word_counts = Counter(words)
Print them in order of frequency, most common first
for word_count in word_counts.most_common():
print(" " + str(word_count))
Now a short example using Pandas. Let's read in a CSV file with Pandas so we can test something. If you want to try this yourself, download the sample real estate transactions data from here (second one down) and open the Anaconda IPython interpreter in the command prompt at the directory you saved the file in.
First, we'll read in the CSV file to a pandas DataFrame. The DataFrame is the main unit of functionality in pandas:
import pandas as pd
df = pd.read_csv
Done. That was easy! To see the DataFrame, just run `df`. By the way, you can also do other awesome things like `read_excel`, `read_json`, and even `read_clipboard`! It's always a good idea to have pandas describe your DataFrame if you just want a quick look at some basic statistical measures.
From this you can find things like the 'centerpoint' of all the houses by looking at the average latitude and longitude (although it's not *really* the center, since the Earth is ellipsoidal), or the minimum house price overall ($1551!!) OK, now let's say we want to find the cheapest house listed in Sacramento with more than 3,500 sqft of land:
house = df.ix[df[(df.sq__ft > 3500) & (df.city == 'SACRAMENTO')]['price'].idxmin()]
Let's figure out what this means from the inside out:
# Get the rows for which sq__ft > 3500 and city == 'SACRAMENTO'
candidates = df[(df.sq__ft > 3500) & (df.city == 'SACRAMENTO')]
# Get the prices for those rows
prices = candidates['price']
# Get the minimum of those prices as a row index
lowest_priced_house_index = prices.idxmin()
# Get the row at that index
house = df.ix[lowest_priced_house_index]
Now we have our house selected. Let's see where it is! We can use the latitude, longitude pair or the address.
import subprocess as sbp
# lat, lng
lat, lng = house['latitude'], house['longitude']
sbp.run(["start", "chrome", "https://www.google.ie/maps/place/" + lat + "," + lng], shell=True)
address = house['street'] + ' ' + house['city']
sbp.run(["start", "chrome", "https://www.google.ie/maps/place/" + address], shell=True)
The address will probably give you better information. If you're on Linux, substitute `"start", "chrome"` for just `"firefox"`.
I would encourage you to take a look at the 10 Minutes to Pandas section on the pandas website, it gives you a very quick tour of some of the most common things you might want to use it for.
I know this is an introduction to Python for Data Science, but I feel I need to emphasize a more language agnostic point: if you don't know how to use regular expressions yet, it is absolutely imperative that you learn how to use them! Dealing with textual data moves away from being a massive headache to a breeze, even after just learning the basics. Here is a great Stack Overflow answer that covers a lot of the basics of regex, with lots of links included. Once you have a rudimentary grasp, try solving the problems on [this](http://regex.alf.nu/) website as practice, I can't recommend it enough.
(by Conor Reynolds, former intern in AIB Data Science.)
Visualizing your data is an important early step in the analysis of any dataset.
It allows you to get a better understanding of the data you are working with and to identify
features of your dataset such as the distribution of data or the presence of any outliers.
Data visualization is also an excellent way of conveying key messages in the presentation of findings that come about as a result of analysis. The visualizations can allow decision makers to understand concepts and spot trends in the data that may not be apparent from looking at statistics or at the data itself.
Some useful tools for data visualization include:
Tableau’s Desktop edition is a very useful platform that allows for the building of data visualizations in a simple and intuitive manner. An annual student subscription can be found here.
See http://leafletjs.com/ for more info.
4. Shiny by RStudio
Shiny is a web application for R. It allows you to create fully interactive visualizations through the use of a number of different plugins including Leaflet, Dygraphs and Highcharts. These applications can subsequently be hosted on the web for interaction by users. Here's an example:
(By Killian Watchorn, Data Scientist with AIB)