This document will walk through a one-time setup that is needed to operationalize the aws-do-pm project in your environment starting from source.
git clone https://github.com/aws-samples/aws-do-pm
The project comes with a command line interface (CLI). To invoke the CLI, just type ./pm
and hit Enter
. The CLI is equipped with help menus that can be accessed by adding --help
as an argument on the command line.
./pm --help
================================================================
Predictive Modeling Command Line Interface
================================================================
Usage:
./pm [--help] - show generic cli help
./pm [target] [action] [arguments] - perform action on specified task with specified arguments
./pm [target] --help - show cli task specific help
targets:
config - manage predictive modeling configuration
system - execute predictive modeling system level tasks
go [tags] - generate a task template from a technique that matches the specified spece-separated list of tags
do <input_json_file_path> - create and execute a task defined in the provided input json file
technique - manage predictive modeling techniques
data - manage datasets
model - perform actions on predictive models
service - perform actions on model services
task - manage tasks
To review the available configuration options, please execute:
./pm config --help`
Output:
Usage:
./pm config <action> [arguments]
actions:
ls - list standard config items
show [item] - show value of all or a single config item
set [[item] [value]] - set values of all or a specific config item
items:
AWS_ACCESS_KEY_ID - key id for access to AWS API, if not specified, settings from 'aws configure' take effect
AWS_SECRET_ACCESS_KEY - secret key for access to AWS API, if not specified, settings from 'aws configure' take effect
REGION - AWS default region, default: us-west-2
REGISTRY - container registry URI including trailing '/', can be blank if running locally
BUILD_TO - target orchestrator for building container images. Options: docker(default), compose
PM_TO - target orchestrator for running aws-do-pm containers. Options: compose(default), kubernetes
PROCESSOR - target processor to run model operations on. Options: cpu(default), gpu
PM_PLATFORM_SCALE - number of platform containers to run (default: 1)
PM_S3_BUCKET - S3 bucket name for storing aws-do-pm data and models
PM_GRAPHDB_SYSTEM_CR - credential to configure for system access to the graph database
PM_GRAPHDB_USER - login id to configure for user access to the graph db
PM_GRAPHDB_PWD - credential to configure for user access to the graph database
PV_TYPE - persistent volume type, use 'efs'(default) for multi-node Kubernetes on AWS, and 'hostpath' for local deployments
EFS_VOLUME_ID - EFS file system id to use as a shared volume for aws-do-pm workloads (used only when PM_TO=kubernetes, default='')
ALB_ALLOW_CIDRS - comma-separated list of IP CIDRs that will be allowed to connect to aws-do-pm UIs (used only when PM_TO=kubernetes, default=0.0.0.0/0)
KCFG_ENC - base64 encoded kube config for aws-do-pm to use when orchestrating tasks on kubernetes (used only when PM_TO=kubernetes, default='')
Next, perform initial configuration of the project by executing ./pm config set
similarly to the example below. Please use actual values for the AWS credentials and your REGISTRY. A blank REGISTRY setting is acceptable if you are planning to only run the demo locally.
./pm config set
Enter value for AWS_ACCESS_KEY_ID [Enter=]: ABABABABABABABABABAB
Enter value for AWS_SECRET_ACCESS_KEY [Enter=]: mlkajhdlkjhqoiuelkjahsdlfhl
Enter value for REGION [Enter=]: us-west-2
Enter value for REGISTRY [Enter=]: 123456789876.dkr.ecr.us-west-2.amazonaws.com/
Enter value for BUILD_TO [Enter=docker]: docker
Enter value for PM_TO [Enter=compose]: compose
Enter value for PROCESSOR [Enter=cpu]: cpu
Enter value for PM_S3_BUCKET [Enter=aws-do-pm]: aws-do-pm
Enter value for PM_GRAPHDB_SYSTEM_CR [Enter=]: root
Enter value for PM_GRAPHDB_ID [Enter=]: pm
Enter value for PM_GRAPHDB_CR [Enter=]: pm
Enter value for PV_TYPE [Enter=efs]:
Enter value for EFS_VOLUME_ID [Enter=]:
Enter value for ALB_ALLOW_CIDRS [Enter=0.0.0.0/0]:
Enter value for KCFG_ENC [Enter=]:
This script configures all necessary settings for the predictive modeling application. You can re-run the ./pm config set
command again if you wish to make additional changes. To edit a single value interactively, you can use ./pm config set <SETTING_NAME>
. To configure a setting with a particular value, you can use ./pm config set <SETTING_NAME> <VALUE>
and to enter expert configuration mode, you can execute ./config.sh
.
Note: [Enter=] in the settings above, means that the default value is blank. When a value is not specifiec after the ":", it means that it is ok to leave it blank for docker-compose. Settings PV_TYPE, EFS_VOLUME_ID, ALB_ALLOW_CIDRS, and KCFG_ENC are needed only when the target orchestrator is Kubernetes.
Note: The PROCESSOR value specified here is a global setting. Each technique can be registered with a task-specific processor setting which overrides this value (Example: ../src/python/pm/technique/technique_registration_model_build_ann.json ). If the processor is set to GPU and a GPU device is not available in the container, then the task will run on CPU. If the processor is set to a blank string then CPU will be used by default.
The project runs all workloads in containers.
./build.sh --help
Usage:
./build.sh [option] - builds one or more containers in this project.
The build behavior is controlled by settings
in the configuration file .env
BUILD_TO - controls whether the build is run by docker or docker-compose
use './config.sh BUILD_TO docker' or './config.sh BUILD_TO compose' to set
BUILD_TO=docker is recommended since it can build all container images
BUILD_TO=compose will only build the platform and ui container image
options when BUILD_TO=docker:
'' - build the last built Dockerfile
ls - list available Dockerfiles to build
all - build all Dockerfiles sequentially
To build the container images from source, simply execute:
./build.sh all
If you would like to run aws-do-pm on Kubernetes or outside of your build environment, you will need to push the container images to a Docker registry. The project provides scripts to automate this task. You would have to specify a REGISTRY in the project configuration before you build and push container images.
./pm config set REGISTRY
- provide container registry URI, including /
at the end
./ecr-setup.sh
- create all necessary repositories in the registry
./login.sh
- authenticate with Amazon Elastic Container Registry
./push.sh all
- push all container images built by the aws-do-pm project
./push.sh --help
Usage: ./push.sh [arg] - push one or more images to the configured registry
arg:
'' - push the latest build image only
ls - list available images to push
all - push all project images
image - push specified image
To push the images to your registry execute the one-time setup:
./pm config set REGISTRY
./ecr-setup.sh
then each time you'd like to push changes to your container images, you can execute:
./login.sh
./push.sh all
The project can be deployed on a single host or a cluster of machines using docker-compose or kubernetes. The Do framework management scripts (./run.sh
, ./status.sh
, ./stop.sh
) will be used to run, monitor, and stop the deployment on any of the configured target orchestrators.
This target orchestrator is suitable for use on a single machine running either Docker Engine or Docker Desktop.
To bring the application up, using docker-compose, ensure PM_TO
is set to compose
and then execute ./run.sh
.
./pm config set PM_TO compose
./run.sh
Output:
Running container aws-do-pm-platform-compose on compose ...
Generating compose files ...
Creating network "aws-do-pm_default" with the default driver
Creating aws-do-pm_platform_1 ... done
Creating aws-do-pm_ui_1 ... done
Creating aws-do-pm_graphdb_1 ... done
Note:
docker-compose
refers to the standalone Docker Compose distribution, whiledocker compose
refers to the Docker Compose distribution that is included in Docker Desktop. By default aws-do-pm usesdocker-compose
. If you would like to usedocker compose
, just modify theDOCKER_COMPOSE
setting in the advanced configuration by executing./config.sh
and setting this line: export DOCKER_COMPOSE="docker compose".
To verify that aws-do-pm containers have started successfully, execute the ./status.sh
script:
./status.sh
Output:
Showing status on target orchestrator compose ...
62d1e9425256 arangodb:latest "/entrypoint.sh aran…" 5 seconds ago Up 3 seconds 0.0.0.0:8529->8529/tcp aws-do-pm_graphdb_1
65fa7f6f0bb6 ************.dkr.ecr.us-west-2.amazonaws.com/aws-do-pm-platform:latest "streamlit run /src/…" 5 seconds ago Up 3 seconds 0.0.0.0:5679->5679/tcp, 0.0.0.0:8501->8501/tcp aws-do-pm_ui_1
2b7908df5caa ************.dkr.ecr.us-west-2.amazonaws.com/aws-do-pm-platform:latest "/startup.sh" 5 seconds ago Up 3 seconds 0.0.0.0:5678->5678/tcp aws-do-pm_platform_1
The framework provides two user interfaces. A platform UI ( http://localhost:8501 ), and a graph database UI ( http://localhost:8529 ). You may browse each of these URLs to ensure the services are reachable. There will be no data since the system has not been initialized. The platform UI does not require a login, use the configured login id and credential for the the graphdb user account to gain access to the database UI.
You have deployed aws-do-pm locally on Docker Compose. Next, you may proceed to running the single EV demo.
A Kubernetes instance can run locally on a single node, or it can run distributed in a multi-node cluster.
This deployment is suitable for use on a local, single-node Kubernetes instance like the one provided by Docker Desktop.
To run aws-do-pm on a single-node local Kubernetes instance, set the PM Target Orchestrator to kubernetes
, set the persistent volume type to hostpath
and execute the ./run.sh
script as shown below:
./pm config set PM_TO kubernetes
./pm config set PV_TYPE hostpath
./run.sh
Output:
Running container aws-do-pm-platform-kubernetes on kubernetes ...
Generating Kubernetes manifests ...
secret/regcred created
To verify that aws-do-pm has started successfully, execute the ./status.sh
script:
./status.sh
Output:
Showing status on target orchestrator kubernetes ...
NAME READY STATUS RESTARTS AGE
graphdb-74b77d866-k5clv 1/1 Running 0 45s
platform-7cd9895987-tkvbz 1/1 Running 0 45s
ui-799cc55c79-mc794 1/1 Running 0 45s
Note: Please allow at least 45 seconds for the pods to enter the Running state. You may run the
./status.sh
script multiple times, or executewatch ./status.sh
to observe the deployment status while the pods are starting up.
To browse the UI services that this deployment offers, we will use port forwarding from a local port to the corresponding service port in Kubernetes. The ./expose.sh
script can be used for this purpose.
./expose.sh --help
Output:
Usage: ./expose.sh [service_name] [local_port_number] - expose a service locally through port forwarding
service_name - name of service to expose (ui|grapdb), default: ui
local_port_number - local port number to forward to the service port, default: 8501
To expose the platform UI service, execute:
./expose.sh ui
Then browse http://localhost:8501 and you should see the screen shown on Fig. 1
Similarly you can expose the graphdb UI by executing:
./expose.sh graphdb
You can then browse http://localhost:8529 and see the screen shown on Fig. 2
A deployment of this type is suitable for running large, distributed, predictive modeling tasks.
This section assumes that a Kubernetes cluster is available and we have credentials that allow full access to the cluster for deployment of aws-do-pm. Also we assume that the EFS CSI Driver is deployed on the cluster since we will be using an Amazon EFS volume as shared storage. Setting up an Amazon Elastic Kubernetes Service (EKS) cluster and deployig the EFS CSI driver can be done by using the aws-do-eks project. Alternatively, you can follow the steps in this end-to-end deployment walkthrough which creates a cluster using aws-do-eks and then deploys aws-do-pm on it.
To run aws-do-pm, ensure PM_TO
is set to kubernetes
, PV_TYPE
is set to efs
, as well as REGISTRY, EFS_VOLUME_ID, ALB_ALLOW_CIDRS, and KCFG_ENC
are configured, then execute the ./run.sh
script.
The KCFG_ENC
setting can be configured by executing the kube config script: ./kcfg.sh
./kcfg.sh --help
Output:
Usage ./kcfg.sh [kubeconfig_path] - configure pm platform for access to Kubernetes
kubeconfig_path - optional location of the kube config file to encode
default: ~/.kube/config
./kcfg.sh
Output:
NEW_VALUE is:
YXBpVmVyc2lvbjogd...
... ... ... ... ...
... ... ... TmZtTGcK
To configure the rest of the settings, it is convenient to use the interactive wizard.
./pm config set
And last, execute the ./run.sh
script to deploy aws-do-pm.
./run.sh
Output:
Running container aws-do-pm-platform-kubernetes on kubernetes ...
Generating Kubernetes manifests ...
Warning: batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
secret/regcred created
To make sure that aws-do-pm is healthy, execute the ./status.sh
script.
./status.sh
Output:
Showing status on target orchestrator kubernetes ...
NAME READY STATUS RESTARTS AGE
graphdb-74b77d866-2djhj 1/1 Running 0 2m12s
platform-5c4f55f97f-hvb6p 1/1 Running 0 2m12s
ui-69b78dffdc-mln4d 1/1 Running 0 2m12s
The aws-do-pm UIs running in a distributed Kubernetes instance are exposed via ingress objects and Application Load Balancers, managed by the AWS Load Balancer Controller, deployed to the cluster. The ingress objects are created automatically by the run script. To view the URLs corresponding to the ingress objects, execute the command below:
kubectl get ingress
Output:
NAME CLASS HOSTS ADDRESS PORTS AGE
graphdb <none> * k8s-awsdopm-graphdb-5bb477c842-43825769.us-west-2.elb.amazonaws.com 80 16h
ui <none> * k8s-awsdopm-ui-547c3e085f-1541392111.us-west-2.elb.amazonaws.com 80 16h
If you copy and paste the ui
address into your browser, you will see the screen shown on Fig. 1. Similarly, browsing the address for graphdb
will bring up the screen shown on Fig. 2.
Note: If you need to temporarily change the allowed CIDRs for an ingress, you can open the ingress object, by executing
kubectl edit ingress <ingress_name>
and edit thealb.ingress.kubernetes.io/inbound-cidrs:
annotation in place.
Several tools are availble for monitoring and visualization of container workloads. The tools utilized in this section is optional and they can be substituted as needed.
The Docker Desktop Dashboard can be used to monitor the local containers. Fig. 3 shows a local aws-do-pm deployment on Docker Compose.
Alternatively, the following docker CLI command can be used:
watch docker ps -a
Output:
Every 2.0s: docker ps -a 88665a0cc820.ant.amazon.com: Wed Feb 23 17:28:59 2022
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
34ad08b7ca42 9248046a6a72 "streamlit run /src/…" 2 hours ago Up 2 hours 0.0.0.0:5679->5679/tcp, 0.0.0.0:8501->8501/tcp aws-do-pm_ui_1
3ac1395eec5e 9248046a6a72 "/startup.sh" 2 hours ago Up 2 hours 0.0.0.0:5678->5678/tcp aws-do-pm_platform_1
685af435cf90 arangodb:latest "/entrypoint.sh aran…" 2 hours ago Up 2 hours 0.0.0.0:8529->8529/tcp aws-do-pm_graphdb_1
Kube-ops-view is a tool which allows visualization of a Kubernetes cluster. We can use the following command to run kube-ops-view locally:
docker run -it -p 8080:8080 -v ~/.kube/config:/root/.kube/config -e KUBECONFIG_CONTEXTS=aws-do-pm-eks,docker-desktop -e KUBECONFIG_PATH=/root/.kube/config hjacobs/kube-ops-view
After the kube-ops-view container is up, browsing the url http://localhost:8080 visualizes the clusters we have configured through the previous command. Fig. 4 below shows a 10-node Kubernetes cluster where aws-do-pm has been deployed.
Alternatively, the following kubectl CLI command can be used:
watch "kubectl -n aws-do-pm get pods"
Output:
Every 2.0s: kubectl -n aws-do-pm get pods 5a014daa7770: Thu Feb 24 01:37:06 2022
NAME READY STATUS RESTARTS AGE
graphdb-74b77d866-7vgk9 1/1 Running 0 48s
platform-5c4f55f97f-fc5r4 1/1 Running 0 48s
ui-69b78dffdc-hcpsx 1/1 Running 0 48s
When the number of pods is too large to fit on one screen, we could count just watch the number of pods, using the following command:
watch "kubectl -n aws-do-pm get pods | grep -v NAME | wc -l"
Sample output:
Every 2.0s: kubectl -n aws-do-pm get pods | gre... 5a014daa7770: Thu Feb 24 01:38:46 2022
1360
At this time your project is configured, built, and deployed. A great way to see predictive modeling in action, using the aws-do-pm framework is to run the included demos.
Back to main README.md