In this guide
- Hardware considerations
- Configuring the firewall
- Installing Lang.ai
- Using an external Elasticsearch cluster and PostgreSQL database
- Using Lang.ai Enterprise
You will need the following in order to install Lang.ai:
- A license file that was sent to you by our team
- An already provided a confirmation e-mail to activate the instance
Lang.ai requires to be installed with specific hardware requirements due to the amount of memory required to run the intent induction process. It also requires a persistent data to store the classifiers information.
Our Enterprise platform is currently designed to to run in a single instance by default. For users that require high API scalability, Lang.ai also offers a Kubernetes version adapted to the specific use case needs.
The minimum hardware requirements are the following:
- 4 CPU cores
- 32 GB memory
- 100 GB data disk space
More resources may be required depending on your usage, such as classifiers and dashboards activity. The bigger the dataset that needs to be processed, the more memory it will require to be allocated.
In terms of software, we support every major Linux distribution.
Setting up the data disk
In order to save the data generated by the classifiers and dashboards, the instance requires a persistent data disk. When creating an instance, make sure to use at least a 100GB volume.
Configuring the firewall
AWS virtual machines are created as a member of a VPC and security group, which has a firewall. For the network associated with the Lang.ai Enterprise you'll need to configure the firewall/security group to allow the required ports listed in the table below.
Make sure the following ports are accessible:
- Port 80: non-HTTPS Web application access
- Port 443: HTTPS Web application access
- Port 8800: Administration console access
Once your instance is running, run the following commands to download the installation script for the stable release channel:
curl -sSL -o install.sh https://get.replicated.com/langai/stable/swarm-init
sudo bash ./install.sh
Once the installation is complete, you should be able to continue the installation directly in the browser. It will be available in the port 8800 of your instance's IP/DNS. Example: http://langai.mycompany.com:8800
Note: if you are getting a Docker Swarm error regarding the instance advertising multiple IPs, it can be fixed by editing the install.sh file and setting the public IP of the instance in the constant SWARM_ADVERTISE_ADDR. You can find it in line 17 of the installation script.
Uploading the license file
The next step is to upload the license file that we provided you. Once the license is validated (keep in mind that the instance will need an active open connection for this), you should receive an activation email. The activation email will be sent to the address that was previously confirmed with the Lang.ai team.
Adding a SSL certificate
In order to have a secure connection to the admin console and the platform, you should provide a private key and certificate to enable HTTPS.
An easy way to get a valid certificate is to use the Certbot tool from Let's Encrypt. You can generate the required private key and certificates for your domain using it. Before starting, make sure to have a DNS entry pointing to your instance IP.
The following example is for an Ubuntu 20.04 instance. Check for instructions for other platforms here.
# Update the package manager
sudo snap install core; sudo snap refresh core
# Install the tool
sudo snap install --classic certbot
# Start a standalone server to validate the DNS
# Replace the domain for the one you have configured
sudo certbot certonly --standalone -d yourdomain.yourcomany.com
Once the command runs successfully, it will displayed where both files are located. Click on "If your private key and cert are already on the server, click here" and fill both fields with those paths. You should now be redirected to the host name indicated in that screen, while running a secure HTTPS connection.
Using an external Elasticsearch cluster and PostgreSQL database
In the next step you will be prompted to use an internal Elasticsearch instance or an external one. The same happens with a PostgreSQL database.
Using an external cluster and database is strongly encouraged if you plan to use the platform in a production environment, as it will prevent scalability issues.
Using Lang.ai Enterprise
Learn more on how to start using it here.
How long does an intent-induction process take?
The real answer is it depends. Based on the amount of text to process, how this texts are (the vocabulary size and average length of the documents) and the resources available.
As an example, using an instance with 8 vCPUs and 61 GB RAM the process may take:
- 3 minutes for a 3000 tweets (292KB) dataset
- 5 minutes for a 9500 tweets (911KB) dataset
Different datasets with larger documents (even though less dataset size) would probably take more time as the vocabulary and texts’ inner-complexity could potentially be bigger.
Do I have to allow any outbound traffic?
Outbound traffic is required to validate the license when installing the platform. Fetching and installing updates also will require outbound traffic.
Do I have to allow any inbound traffic?
Yes, you must allow inbound traffic on ports 80 (if HTTPS is not set), 443 and 8800
Where do I have to deploy the instance?
It can be deployed in a public or private subnet where the needed inbound traffic can be allowed to certain IP addresses or IP ranges based on user needs.
If the instance is deployed in a private subnet without a public IP, the instance must be reachable through its private IP by the users or systems that will use it.
Where is the data stored?
All the data is stored in the local file system of the instance, no other storage services are needed. Ensure that you have provided enough disk space to run all the datasets before launching the instance.
How can I upgrade the to newer versions?
The platform updates can be installed directly from the administration console.