CrowFacts, an Example Project for AWS

The organizers of UWB Hacks the Cloud wanted to build an application that allows our UW Bothell compatriots to continue building their fulfilling relationships with the unoffical campus mascot, the crow.

From that ideal, we built a cloud-hosted datastore with a user experience to delight and inspire young and old alike.

Read about the project and how we built it!

Goals

One of the biggest problems faced by humanity is that we have no way to easily get facts about crows. Until now.

We wanted to build an application where:

Users can get information about crows
Users can insert new information into the database, and get data from other users
All data is hosted on AWS S3 and DynamoDB is used to access and create new entries

User Experience

The user goes to our website and has the option to get crow facts or enter new crow facts.
The user can input their own crowfacts (assuming it doesn't contain any no-no words).

via GIPHY

Architecture

To create a functional user website which achieved the above goals, we used AWS services to host the website. The website's design is described in detail below.

An S3 bucket was used as the website host, and was configured with a custom domain as well as SSL certification for HTTPS traffic.
This website was designed to give users access to facts, or to make their own.
These user actions are made via API gateway calls to Lambda functions, which execute GET or PUT calls.
Function calls create a link to either the CrowFacts or FunFacts DynamoDB tables.
- CrowFacts: NoSQL database that returns species information on crows. Facts are retrieved by combination of partition and sort key.
  - Primary key: "CrowSpecies" + "habitat"
  - Sort:
    - "description": a short decription of the crows appearance
    - "SubSpecies": specifies what subspecies a crow is, e.g. Florida Crow
    - "scientific": the scientific name of the crow species
    - "image": url to an image of the crow species
- FunFacts: A mixture of actual fun facts about crows, and user inputted facts, sanitized at time of Lambda call.
  - Primary key: "fact"
  - Sort:
    - "source": the source of who inputted the fact
Success of GET or PUT call is logged via AWS Cloudwatch.
If successful, the information will be returned to API Gateway which will in turn return the information retrieved to the website.
Based on if user made a call to retrieve crow information, or fun facts, the S3 bucket will create a visual representation of the information
- Website uses the image url's in the "image" column to show crow photos.
- Additionally, the website uses the "description" column to give a verbal description of the crows.

Visit the website here.

View the Lambda code for the PUT call here.

View the Lambda code for the GET call here.

The diagram of the system flow is below:

Implementation Notes

S3 & Website Construction

Website Creation

The website was written on a local machine and deployed to S3.

The website for this project was built with a straightforward HTML, CSS, and vanilla JavaScript stack. jQuery was used to perform HTTP requests to the database call pipeline. The complete code for the website is available in the project repository.

To streamline site deployment, we used a GitHub Action which will copy specified files to the S3 bucket when changes are pushed to master. This was great, because we didn't have to manually update the contents of the S3 bucket during website development.

Domain Configuration

To set up the custom domain for the site, we did the following:

Chose a desired URL for the site: https://crowfacts.uwbhacks.com
Created an S3 bucket named as the full URL and set it up for static website serving, by following these instructions from the AWS docs. Because we already owned the domain, we skipped all steps related to AWS Route 53 in that tutorial.
In Cloudflare, our DNS management service, we created the crowfacts subdomain for uwbhacks.com as a CNAME record.
We then updated the bucket accesss policies with this template provided by Cloudflare.

We used the following tutorials and resources as references:

API Gateway

Basic Configuration

The basic API Gateway configuration we did was:

Create a new API in the AWS Console. We named ours crowfacts and chose the REST API protocol.
Create two resources: getCrowSpecies and UserFacts.
Ensure that the Lambda functions we wanted to link HTTP methods to were in the AWS account, because the functions have to be available at the time of method creation.
Under getCrowSpecies, create a GET method and link it to the corresponding Lambda function we wrote for this purpose.
Under UserFacts, create a GET method and link it to the corresponding Lambda function we wrote for this purpose.
Under UserFacts, create a POST method and link it to the corresponding Lambda function we wrote for this purpose.
Under the "Actions" menu dropdown, click "Enable CORS". Accept all default settings and click the "Enable" button.
Under the "Actions" menu dropdown, click "Deploy API". Create a new deployment stage and click Deploy.
After deployment, click on "Stages" in the left-hand menu for the API. Expand the resource tree and click on each method to get the URL endpoint for that method & resource.

Ta-da! After completing these steps, we were able to successfully cURL the endpoints for our API and validate that it worked as expected.

Custom HTTP Error Handling

This application processes user input, and as such, we wanted to validate that input and return errors to the client when the input didn't meet our specifications for database entries. We did this by setting up custom error handling in the API Gateway POST method under the /UserFacts resource.

To enable the client to receive custom HTTP errors based on the Lambda function's responses, we set up Integration and Method responses. We used this AWS blog post as our primary reference.

Lambda Function Configuration

Instead of returning a JSON blob from Lambda, we had to raise an exception in the Lambda function to get API Gateway to use custom error responses. An example of the correct syntax is:

import json
import boto3
import os

def lambda_handler(event, context):
    raise Exception('This is an example exception from Lambda.')

The actual implementation of where we raised exceptions can be found here.

Method Responses from API Gateway to the Client

In order to set up error handling from Lambda, we had to first define what status codes we wanted to return to the client. Each status code indicates a different type of error and is accompanied by a different error message which the client site can parse.

We used the following error codes:

400: Bad Request This error will be returned when the client did not send the required JSON keypairs the DynamoDB table needed.
403: Forbidden This error will be returned when the client's request had valid JSON keys, but the values contained verbiage which is not appropriate for all ages.
500: Internal Server Error This error will be returned when the Lambda function encountered an error trying to insert the data into the DynamoDB table.

For each of these method responses, we had to add the Access-Control-Allow-Origin header to the status code so that the custom error would fulfill the browser's CORS requirement. Clicking "Enable CORS" in the console action dropdown only enables CORS for the standard 200 status code response.

After completing this step, our method response section looked like this:

Integration Responses from Lambda to API Gateway

After we designated the HTTP error status codes we wanted to use in Method Response, we had to define what Lambda errors mapped to what status codes.

To do this, we created integration responses for each custom method. For each integration response, we provide a regular expression which searches the error message returned from Lambda. If a match for the regular expression is found, API Gateway will return the HTTP error code associated with that regular expression.

We have the option to define post-processing for the Lambda error message, but we elected to return Lambda's raw string; this is called a passthrough response.

Our definitions look something like this:

After we created the Integration responses for our custom errors, we had to enable CORS for the errors by hand. We did this by creating a new header mapping for each Lambda Error Regex entry. We set the header to Access-Control-Allow-Origin and the mapped value to '*'. The syntax for this step is very important; it must respect the CORS protocol.

After setting this up, our integration responses looked like this:

Handling Error Responses in the Client

The jQuery function used by the client to makes the request requires a definition of how to handle HTTP errors. We chose to simply display the error message from API Gateway. The definition for how we handled the custom errors can be seen here.

Lambda

AWS Lambda handles all of CrowFacts' computation. Within AWS Lambda, there are several important things that must be configured in order for CrowFacts' code to operate correctly.

Give Lambda access to DynamoDB

By default, creating a Lambda function from scratch, adding DynamoDB code, and pressing "Test" won't work. Why? Simple, because the Lambda function does not have access to DynamoDB. Cloud systems operate on a least-required permissions model, meaning that in order for your Lambda function to access DynamoDB, you must explicitly grant Lambda access to DynamoDB (and any other AWS services you might want Lambda to access). Refer to AWS documentation on IAM permissions for DynamoDB, and check out [the TL;DR explanation of IAM on the documentation site for this hackathon]({% link _docs/aws_secrets.md %}).

Set up environment variables for Lambda functions

In order to keep the code looking clean, and to avoid writing publicly available code that contains sensitive information (including naughty words), we can use Environment Variables. These are exactly like environment variables on your computer. They're variables that you can access in your code like any other, but since they're not declared or otherwise modified in the source code, they're invisible and can't be accidentally leaked (say when you upload your code to a public git repository).
After you create a Lambda function in the AWS portal, you will be greeted with the function's page. Here you can edit the code directly and modify the function's settings. Right below the code window, you will see a menu item for Environment Variables. Click Edit to modify them. The key is the variable name, and the value is the variable's value. Click Save to save these variables in the function. Now you can access the environment variables in the Lambda function's code by calling it using the key. See AWS Documentation on Environment Variables for more info.

DynamoDB

The information presented on the website was stored in a DynamoDB database table. This is a NoSQL database, and there are important factors to consider when implementing this in a project.

Consider your usage

NoSQL databases do not require a defined relational schema as SQL databases do. They are flexible, allowing attributes to be introduced as needed. If the relationships between data points are important, then consider AWS's relational database options such as Amazon RDS.

Create a table

Your table is the place where you store your information. Each table requires a primary key, which can be a single item, or a combination of a partition and sort key. Consider your choice carefully, as there is no way to change your primary key once the table is created. Reference the list of reserved words for DynamoDB, to make sure you are not using one of those words, which will cause problems down the line.

Set up Permissions

Interaction with other AWS services requires the set up of permissions, so that those services may interact with your database. AWS utilizes IAM permissions, which can be made as broad or as narrow as your use-case requires. Setting your permissions up is a key factor for being able to use Lambda functions on your database or make API Gateway calls.

Import/Export Datasets

There are multiple approaches for handling the import and export of your DynamoDB information. See the following links for some options on how to implement it.

AWS Resources, SDK Links, Tutorials, and Helpful Tidbits: Importing Large Datasets
This document goes over some options for importing and exporting data in DynamoDB. It includes functional code for Lambda calls to import/export a .JSON file.

What is AWS Data Pipeline?
This is the AWS documentation on Data Pipeline, a service which can be used to import/export database information. Can become costly quickly, so use with caution.

Mockaroo
A good website to use to get some mock data to use when testing your import/export methods.

Gotchas & Lessons Learned

Data Pipeline cost
Reserved words
More work than initially thought

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
lambda_get_user_facts		lambda_get_user_facts
lambda_put_user_fact		lambda_put_user_fact
lambda_species		lambda_species
website		website
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrowFacts, an Example Project for AWS

Goals

User Experience

Architecture