The organizers of UWB Hacks the Cloud wanted to build an application that allows our UW Bothell compatriots to continue building their fulfilling relationships with the unoffical campus mascot, the crow.
From that ideal, we built a cloud-hosted datastore with a user experience to delight and inspire young and old alike.
Read about the project and how we built it!
One of the biggest problems faced by humanity is that we have no way to easily get facts about crows. Until now.
We wanted to build an application where:
- Users can get information about crows
- Users can insert new information into the database, and get data from other users
- All data is hosted on AWS S3 and DynamoDB is used to access and create new entries
-
The user goes to our website and has the option to get crow facts or enter new crow facts.
-
The user can input their own crowfacts (assuming it doesn't contain any no-no words).
To create a functional user website which achieved the above goals, we used AWS services to host the website. The website's design is described in detail below.
- An S3 bucket was used as the website host, and was configured with a custom domain as well as SSL certification for HTTPS traffic.
- This website was designed to give users access to facts, or to make their own.
- These user actions are made via API gateway calls to Lambda functions, which execute GET or PUT calls.
- Function calls create a link to either the CrowFacts or FunFacts DynamoDB tables.
- CrowFacts: NoSQL database that returns species information on crows. Facts are retrieved by combination of partition and sort key.
- Primary key: "CrowSpecies" + "habitat"
- Sort:
- "description": a short decription of the crows appearance
- "SubSpecies": specifies what subspecies a crow is, e.g. Florida Crow
- "scientific": the scientific name of the crow species
- "image": url to an image of the crow species
- FunFacts: A mixture of actual fun facts about crows, and user inputted facts, sanitized at time of Lambda call.
- Primary key: "fact"
- Sort:
- "source": the source of who inputted the fact
- CrowFacts: NoSQL database that returns species information on crows. Facts are retrieved by combination of partition and sort key.
- Success of GET or PUT call is logged via AWS Cloudwatch.
- If successful, the information will be returned to API Gateway which will in turn return the information retrieved to the website.
- Based on if user made a call to retrieve crow information, or fun facts, the S3 bucket will create a visual representation of the information
- Website uses the image url's in the "image" column to show crow photos.
- Additionally, the website uses the "description" column to give a verbal description of the crows.
View the Lambda code for the PUT call here.
View the Lambda code for the GET call here.
The diagram of the system flow is below:
The website was written on a local machine and deployed to S3.
The website for this project was built with a straightforward HTML, CSS, and vanilla JavaScript stack. jQuery was used to perform HTTP requests to the database call pipeline. The complete code for the website is available in the project repository.
To streamline site deployment, we used a GitHub Action which will copy specified files to the S3 bucket when changes are pushed to master. This was great, because we didn't have to manually update the contents of the S3 bucket during website development.
To set up the custom domain for the site, we did the following:
- Chose a desired URL for the site: https://crowfacts.uwbhacks.com
- Created an S3 bucket named as the full URL and set it up for static website serving, by following these instructions from the AWS docs. Because we already owned the domain, we skipped all steps related to AWS Route 53 in that tutorial.
- In Cloudflare, our DNS management service, we created the
crowfacts
subdomain foruwbhacks.com
as aCNAME
record. - We then updated the bucket accesss policies with this template provided by Cloudflare.
We used the following tutorials and resources as references:
- AWS Docs: Hosting a Static Website on S3
- AWS Docs: Hosting an S3 Site with a Custom Domain
- Cloudflare docs: Configuring S3 Static Site to Use Cloudflare
The basic API Gateway configuration we did was:
- Create a new API in the AWS Console. We named ours
crowfacts
and chose theREST
API protocol. - Create two resources:
getCrowSpecies
andUserFacts
. - Ensure that the Lambda functions we wanted to link HTTP methods to were in the AWS account, because the functions have to be available at the time of method creation.
- Under
getCrowSpecies
, create aGET
method and link it to the corresponding Lambda function we wrote for this purpose. - Under
UserFacts
, create aGET
method and link it to the corresponding Lambda function we wrote for this purpose. - Under
UserFacts
, create aPOST
method and link it to the corresponding Lambda function we wrote for this purpose. - Under the "Actions" menu dropdown, click "Enable CORS". Accept all default settings and click the "Enable" button.
- Under the "Actions" menu dropdown, click "Deploy API". Create a new deployment stage and click Deploy.
- After deployment, click on "Stages" in the left-hand menu for the API. Expand the resource tree and click on each method to get the URL endpoint for that method & resource.
Ta-da! After completing these steps, we were able to successfully cURL
the
endpoints for our API and validate that it worked as expected.
This application processes user input, and as such, we wanted to validate
that input and return errors to the client when the input didn't meet our
specifications for database entries. We did this by setting up custom error
handling in the API Gateway POST
method under the /UserFacts
resource.
To enable the client to receive custom HTTP errors based on the Lambda function's responses, we set up Integration and Method responses. We used this AWS blog post as our primary reference.
Instead of returning a JSON blob from Lambda, we had to raise an exception in the Lambda function to get API Gateway to use custom error responses. An example of the correct syntax is:
import json
import boto3
import os
def lambda_handler(event, context):
raise Exception('This is an example exception from Lambda.')
The actual implementation of where we raised exceptions can be found here.
In order to set up error handling from Lambda, we had to first define what status codes we wanted to return to the client. Each status code indicates a different type of error and is accompanied by a different error message which the client site can parse.
We used the following error codes:
400: Bad Request
This error will be returned when the client did not send the required JSON keypairs the DynamoDB table needed.403: Forbidden
This error will be returned when the client's request had valid JSON keys, but the values contained verbiage which is not appropriate for all ages.500: Internal Server Error
This error will be returned when the Lambda function encountered an error trying to insert the data into the DynamoDB table.
For each of these method responses, we had to add the Access-Control-Allow-Origin
header to the status code so that the custom error would fulfill the
browser's CORS requirement. Clicking "Enable CORS" in the console action
dropdown only enables CORS for the standard 200
status code response.
After completing this step, our method response section looked like this:
After we designated the HTTP error status codes we wanted to use in Method Response, we had to define what Lambda errors mapped to what status codes.
To do this, we created integration responses for each custom method. For each integration response, we provide a regular expression which searches the error message returned from Lambda. If a match for the regular expression is found, API Gateway will return the HTTP error code associated with that regular expression.
We have the option to define post-processing for the Lambda error message, but we elected to return Lambda's raw string; this is called a passthrough response.
Our definitions look something like this:
After we created the Integration responses for our custom errors, we had
to enable CORS for the errors by hand. We did this by creating a new
header mapping for each Lambda Error Regex entry. We set the header to
Access-Control-Allow-Origin
and the mapped value to '*'
. The syntax
for this step is very important; it must respect the CORS protocol.
After setting this up, our integration responses looked like this:
The jQuery function used by the client to makes the request requires a definition of how to handle HTTP errors. We chose to simply display the error message from API Gateway. The definition for how we handled the custom errors can be seen here.
AWS Lambda handles all of CrowFacts' computation. Within AWS Lambda, there are several important things that must be configured in order for CrowFacts' code to operate correctly.
By default, creating a Lambda function from scratch, adding DynamoDB code, and pressing "Test" won't work. Why? Simple, because the Lambda function does not have access to DynamoDB. Cloud systems operate on a least-required permissions model, meaning that in order for your Lambda function to access DynamoDB, you must explicitly grant Lambda access to DynamoDB (and any other AWS services you might want Lambda to access). Refer to AWS documentation on IAM permissions for DynamoDB, and check out [the TL;DR explanation of IAM on the documentation site for this hackathon]({% link _docs/aws_secrets.md %}).
-
In order to keep the code looking clean, and to avoid writing publicly available code that contains sensitive information (including naughty words), we can use Environment Variables. These are exactly like environment variables on your computer. They're variables that you can access in your code like any other, but since they're not declared or otherwise modified in the source code, they're invisible and can't be accidentally leaked (say when you upload your code to a public git repository).
-
After you create a Lambda function in the AWS portal, you will be greeted with the function's page. Here you can edit the code directly and modify the function's settings. Right below the code window, you will see a menu item for Environment Variables. Click Edit to modify them. The key is the variable name, and the value is the variable's value. Click Save to save these variables in the function. Now you can access the environment variables in the Lambda function's code by calling it using the key. See AWS Documentation on Environment Variables for more info.
The information presented on the website was stored in a DynamoDB database table. This is a NoSQL database, and there are important factors to consider when implementing this in a project.
NoSQL databases do not require a defined relational schema as SQL databases do. They are flexible, allowing attributes to be introduced as needed. If the relationships between data points are important, then consider AWS's relational database options such as Amazon RDS.
Your table is the place where you store your information. Each table requires a primary key, which can be a single item, or a combination of a partition and sort key. Consider your choice carefully, as there is no way to change your primary key once the table is created. Reference the list of reserved words for DynamoDB, to make sure you are not using one of those words, which will cause problems down the line.
Interaction with other AWS services requires the set up of permissions, so that those services may interact with your database. AWS utilizes IAM permissions, which can be made as broad or as narrow as your use-case requires. Setting your permissions up is a key factor for being able to use Lambda functions on your database or make API Gateway calls.
There are multiple approaches for handling the import and export of your DynamoDB information. See the following links for some options on how to implement it.
AWS Resources, SDK Links, Tutorials, and Helpful Tidbits: Importing Large Datasets
This document goes over some options for importing and exporting data in DynamoDB. It includes functional code for Lambda calls to import/export a .JSON file.
What is AWS Data Pipeline?
This is the AWS documentation on Data Pipeline, a service which can be used to import/export database information. Can become costly quickly, so use with caution.
Mockaroo
A good website to use to get some mock data to use when testing your import/export methods.
- Data Pipeline cost
- Reserved words
- More work than initially thought