Dev environment for contiamo
npm install @contiamo/devGet the dev environment fast!
Get started:
- make docker-auth
- make pull
Get the latest versions:
- git pull
- make pull
Start everything in normal mode:
- make start
Stop everything:
- make stop
Stop everything and clean up:
- make clean
Prepare for Pantheon-external mode (only do this once):
- make build
- sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'
Start everything in Pantheon-external mode:
- make pantheon-start
- (In Pantheon directory) env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt run
Enable TLS verify-full mode on port 5435:
- Download the private key for *.dev.contiamo.io: make get-pg-key
- echo "127.0.0.1 pg-localhost.dev.contiamo.io" | sudo tee -a /etc/hosts
- make build
- make pantheon-start
- You may need to tell your local psql about the IdenTrust root we happen to be using: curl https://letsencrypt.org/certs/trustid-x3-root.pem.txt > ~/.postgresql/root.crt
- psql "user=lemon@example.com password=
Before you start, you must install Docker and Docker-Compose.
Additionally, the development requires access to our private docker registry. To access this ask the Ops team for permissions. Once permissions have been granted you must install the gcloud CLI.
Once installed, run
``sh`
make docker-auth pull
This will attempt to
1. authenticate with Google,
2. configure your Docker installation to use the new Google credentials, and
3. pull the required Docker images.
`sh`
make start
Once the environment has started, you should see a message with a URL and credentials, like this
``
Dev ui: http://localhost:9898/contiamo/profile
Email: lemon@example.com
Password: localdev
The above section starts with a completely empty environment. A standard development environment
with preconfigured data sources (the internal metadbs) is provided in the project and can be started
with
`sh`
make load-snapshot
The existing environment (if any) will be stopped and _destroyed_, so be careful. It will then
start the db, load the data, and then start the rest of the environment.
The environment
* contains two users lemon@example.com and lemonjr@example.com both with password localdevfoodmart
* has all of the datahub metadbs installed, , alaska, and liftdataliftdata
* there are two virtualdbs with two views each. One that shows the maintenance tasks inside Hub and the other showing the use of PostGIS queries
* Mr. Lemon is an admin for everything
* Lemon Jr is not an admin and has various permission levels, is private and not available to Lemon Jr
* There is a basic amount of metadata assigned to the datasources and tables including custom fields, descriptions, a mix of names, and even one with documentation
This should allow for basic development and testing of most use cases.
| variable | value |
|-----------------------|---------------------------------------------------|
| AUTH_IMAGE | eu.gcr.io/dev-and-test-env/idp:dev |GRAPHQL_IMAGE
| | eu.gcr.io/dev-and-test-env/pgql-server:dev |UI_IMAGE
| | eu.gcr.io/dev-and-test-env/contiamo-ui:dev |HUB_IMAGE
| | eu.gcr.io/dev-and-test-env/hub:dev |DATASTORE_IMAGE
| | eu.gcr.io/dev-and-test-env/datastore:dev |PANTHEON_IMAGE
| | eu.gcr.io/dev-and-test-env/pantheon:dev |HUB_IMAGE
| | eu.gcr.io/dev-and-test-env/hub:dev |PROFILER_IMAGE
| | eu.gcr.io/dev-and-test-env/profiler:dev |SYNC_INGESTER_IMAGE
| | eu.gcr.io/dev-and-test-env/sync-ingester:latest |SYNC_AGENT_TABLEAU_IMAGE
| | eu.gcr.io/dev-and-test-env/sync-agent-tableau:dev |
You can manually override the image used by setting the required variable and the restarting the services
`sh`
export HUB_IMAGE=eu.gcr.io/dev-and-test-env/hub:v1.2.3
make stop start
sign-up service or integration sync-agents for other resource types (like Tableau), you need to enable the optional integration services. To do this, simply export this env variable`sh
export COMPOSE_FILES="-f docker-compose.yml -f docker-compose-extra.yml"
`This will modify the
start and stop commands to include the integration services.$3
A helper make target is provided that will automatically pull and restart the local environment with PR preview image for the specified services.For example to test PR 501 for
hub together with PR 489 for contiamo-ui, use`sh
make pr-preview services=hub:501,contiamo-ui:489
`All other services will use the default images.
To reset to the original state, use
`sh
make stop start
`$3
The project comes with a suite of end-to-end tests that use the API to verify that the backend services are working as expected. You can run this in any environment by using`sh
make test
`This assumes that you have already started the localdev environment using
make start or make pr-preview.$3
By default, the Datasets feature wouldn't work with external DWH systems (e.g. Redshift, Snowflake). Data transfer to these systems needs to go
through mutually accessible object storage. There is
pantheon-datasource-test bucket on S3, but this repo doesn't include credentials for it.
If your scenario requires working with an external DWH you can pass S3 credentials by setting DATASETS_AWS_ACCESS_KEY_ID and DATASETS_AWS_SECRET_ACCESS_KEY
environment variables. Additionally, the bucket name property should be set via DATASETS_S3_BUCKET variable. An easy way to set these
variables is the .env file.$3
Two pre-created datasets have been created that provide more interesting stats and entity detection profiles. These should be used to test the Profiler and the related UI components.The datasets are available in
./datasetspii.csv contains PII columns that should be detected during the entity detection profile.
2. sales.csv' also contains PII data, but is a good sample for the stats report.* Postgres database that contains a single table liftdata.
* Postgis (Postgres) database that contains geometry of Alaska regions. The purpose is to test geometry-related operations for Pantheon and PGQL server.
#### Lift data
After starting the local dev environment, run:
``sh`
docker run --name liftdata --rm --network dev_default eu.gcr.io/dev-and-test-env/deutschebahn-liftdata-postgres:v1.0.0
In the Data Hub, you can now add a external the data source using:
| field | value |
|------------|---------------------|
| HOST | liftdata |PORT
| | 5432 |DATABASE
| | liftdata |USER
| | pantheon |PASS
| | contiamodatahub19 |
when you are done, run
`sh`
docker kill liftdata
to stop and cleanup the database container.
#### Postgis Alaska regions
After starting the local dev environment, run:
`sh`
docker run --name alaska --rm --network dev_default eu.gcr.io/dev-and-test-env/alaska-postgis:1.0.0
In the Data Hub, you can now add the data source using:
| field | value |
|------------|---------------------|
| HOST | alaska |PORT
| | 5432 |DATABASE
| | alaska |USER
| | pantheon |PASS
| | contiamodatahub19 |
when you are done, run
`sh`
docker kill alaska
to stop and cleanup the database container.
`sh`
make stop
Any data in the databases will be preserved between stop and start.
| service | db name | host | port | username | password |
|-------------|-------------|----------|--------|------------|------------|
| datastore | datastore | metadb | 5433 | user | localdev |hub
| | hub | metadb | 5433 | user | localdev |idp
| | simpleidp | metadb | 5433 | user | localdev |pantheon
| | pantheon | metadb | 5433 | pantheon | test |
Go to http://localhost:5050 (The link is on http://localhost:9898/lemonade-shop/configuration page)
Login with the following credentials:
- Email: pgadmin4@pgadmin.orgadmin
- Password:
Add the metadb server with the following connection info:
- Host name/address: metadb5433
- Port: user
- Username: localdev
- Password:
- Save password?: ✅
`sh`
make clean
This will stop your current environment and remove any Docker volumes related to it. This includes any data and metadata in the databases.
As time goes on, Docker will download new images, but it does not automatically garbage collect old images. To do so, run docker system prune.
On Mac, all Docker file system data is stored in a single file of a fixed size, which is 16GB or 32GB by default. You can configure the size of this file by clicking on the Docker Desktop tray icon -> Preferences -> Disk -> move the slider.
You can find snapshot.sh and restore.sh files in the ./scripts folder.
Both scripts have the only parameter — a filename.
To make an encrypted snapshot from your local dev environment use:
`sh`
./snapshot.sh localdev.snapshot
this will ask you to set the encryption key, will export the database of each service applying compression.
The snapshot is encrypted with a symmetric key (AES-128 cipher).
To erase your local database for each service and restore it to the state of the earlier exported snapshot use:
`sh`
./restore.sh localdev.snapshot
this will delete all the data you have locally and will perform a reverse operation for shapshot.sh.
IMPORTANT: do not move the scripts out of their ./scripts folder, they use relative paths.
The make load-snapshot uses the committed localdev.snapshot. You can use the script, as described above, to load any other snapsnots
* Run make or make help to see all available commands.
* You can also run these commands from a different directory, with e.g. make -C /path/to/dev start.
* The commands in the Makefile are very useful, but there's some extra stuff available if you use docker-compose straight. For instance, get all logs with docker-compose logs --follow, or only datastore worker logs with docker-compose logs --follow ds-worker. Refer to docker-compose.yml for the definitions of the services.
* To use docker-compose without cd'ing to this directory, use e.g. docker-compose -f /path/to/dev/docker-compose.yml logs --follow.
| Server | Environment Variable | Default |
|-------------|----------------------|----------|
| datastore | DATASTORE_TAG | dev |IDP_TAG
| idp | | dev |PANTHEON_TAG
| pantheon | | latest |CONTIAMOUI_TAG
| contiamo-ui | | latest |
In environment variable POSTGRES_ARGS, you can pass extra arguments to the PostgreSQL daemon. By defaults, this is set to -c log_connections=on. To log modification statements in addition to connections, start the dev environment with
env POSTGRES_ARGS="-c log_connections=on -c log_statement=mod" make start
You can inspect these logs with docker-compose logs --follow metadb. The four acceptable values for log_statement are none, ddl, mod, and all. Further Postgres options can be found here: https://www.postgresql.org/docs/11/runtime-config.html .
Local Pantheon debug development is supported by port redirection. To set this up, you first need to run two extra steps.
1. Run
`sh`
make build
This builds the eu.gcr.io/dev-and-test-env/pantheon:redir Docker image, a "pseudo-Pantheon" that forwards everything to your local Pantheon on 127.0.0.1 port 4300. _Do not push this image!_
2. Modify your /etc/hosts file to add
``
127.0.0.1 metadb
You can easily do this with sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'.
This ensures that Pantheon can correctly resolve the storage database service.
Make sure you first set up the prerequisites, and also set up for Pantheon local development.
To start the Pantheon dev environment use
`sh`
make pantheon-start
This will replace the Pantheon image with a simple port redirection image that will enable transparent redirect of
- http://localhost:9898/pantheon/api/v1/ to http://localhost:4300/api/v1/ ,
- http://localhost:9898/pantheon/jdbc/ to http://localhost:8765/ .
You can then start your local Pantheon debug build, e.g. from your IDE, and have it bind to those ports on localhost. To configure the meta-DB and enable data store from Pantheon, run SBT with
`sh`
env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt
or set the same environment variables in IntelliJ. You can also use export METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external, to set the environment variables in the current terminal.
----
The docker-compose configuration will expose the following ports for use from local Pantheon:
- Nginx web server at 127.0.0.1 port 9898 <-- Use this to access Data Hub including UI, IDP, Pantheon, Datastore.127.0.0.1
- PostgreSQL meta-DB at port 5433, username pantheon, password test.127.0.0.1
- Datastore manager at port 9191127.0.0.1
- Minio (for ingested files) at port 9000
When accessing Pantheon via Nginx on port 9898, you need to pre-pend /pantheon to Pantheon URLs, for instance: http://localhost:9898/pantheon/api/v1/status . Nginx will strip off the /pantheon, authenticate the request with IDP, and forward the request to Pantheon as /api/v1/status.
Using the pantheon/test credentials for Postgres, you also have access to
- the metadb database, for datastore,simpleidp
- collection databases corresponding to a managed DB,
- collection databases corresponding to materializations for a project,
- the database.
You can also run Pantheon in prod mode locally, as follows.
1. In sbt shell, run dist.docker build -t eu.gcr.io/dev-and-test-env/pantheon:local .
2. From a console, run This will download dependencies if they are not cached yet, build a Docker image for Pantheon, and tag it local.env PANTHEON_TAG=local make start`.
3. Run
Now datastore and metadb will still be available on the usual ports, but Nginx will proxy to a prod-mode Pantheon which runs inside Docker. Pantheon will automatically be run with appropriate environment variables (https://github.com/contiamo/dev/blob/master/docker-compose.yml#L81).
Warning! Do not push this image to GCR. It may accidentally end up being deployed on dev.contiamo.io .
The Profiler currently lives at http://localhost:8383.