The Repustate Server is a self-contained executable that provides the full functionality of the Repustate API but with the privacy of being hosted in your own data center. There are no quotas or usage restrictions when utilizing the Repustate Server. It is the ideal product for organizations who need text analytics at a very large scale.
The Repustate Server can be installed on any 64-bit operating system including:
As a result of being platform agnostic, the Repustate Server can be installed on any dedicated hardware you own in your own private data center or up on cloud infrastructure such as Azure, Google Compute Engine or AWS.
The recommended specs for hardware needed to run the Repustate Server are:
Note about Deep Semantic Search: To enable Deep Semantic Search and/or to use the entity extraction API call, your server's CPU must support the AVX and AVX2 instructions.
Once you've obtained your license, a link to your installer will be emailed to you. The steps that follow assume you've downloaded the installer executable on the server you plan on hosting Repustate.
If you'll be using Deep Semantic Search then you have to configure the backend for storing data. The Repustate installer creates a directory config. In this directory, you'll find a file deepsearch which has sample configurations for all supported backends. Create a file called deepsearch and specify the backend and connection parameters for your backend. For example, if you were using PostgreSQL, your config/deepsearch file would look like:
[storage] type = "postgres" dsn = "user=postgres dbname=deepsearch sslmode=disable host=localhost port=5432 password=123"The following backends are supported:
The great thing about the Repustate Server is that you can use the same code and clients for the public API as you can for the Server, so it's easy to switch from one to the other.
All API calls that you see on our public API work exactly the same on your Repustate Server. The only difference is instead of sending your API calls to https://api.repustate.com, you send them to the IP address(es) of your Repustate Server instance(s).
The server also allows you to specify some options at the command line to configure behaviour (you can also run ./repustate -h to see these options). Alternatively, these same options can be set via environment variable and/or through the use of a .env file in the $REPUSTATE_HOME directory.
Option | Environment variable | Default | Description |
---|---|---|---|
--host | REPUSTATE_HOST | localhost (127.0.0.1) | Specify the IP address the Repustate Server should bind to |
--port | REPUSTATE_PORT | 9000 | Specify which port the Repustate Server should listen to for incoming API calls |
--langs | REPUSTATE_LANGS | All | Specify which languages to include at startup time. If you're only interested in analyzing a few languages, specify a comma separated list of language codes. This will help reduce startup time. e.g. --langs en,de,fr would enable only English, German and French |
--verbose | REPUSTATE_VERBOSE | If included, the Repustate Server will output various status messages and periodically display mean response time for API calls. If settings via environment variable, set equal to 1. | |
--license | If included will display when your current license expires | ||
--version | Display which version of the Repustate Server you're using |
When Repustate releases a new version of the Server, you will receive an email with a link to download an updated installer. Download and run the installer. It will create a new Repustate executable that is meant to replace the existing one you have. While you could stop the old server, replace the executable binary, and then restart, this would result in downtime of a minute or two.
In order to have API calls get handled by the new version, merely replace the old executable with the new one and send a USR2 signal to the process ID of the old process. Any existing or in-flight API calls will be handled by the old process and once they're all done, the old process will shutdown and the new one will take over. The process ID can be found in a file called `repustate.pid` in the same directory as the executable itself. For Linux users (this is not yet supported on Windows), this is how a graceful restart can be accomplished:
kill -USR2 `cat repustate.pid`
In order to increase throughput and to add redundancy in the event of an unexpected outage, it is advisable to deploy Repustate across multiple servers. By putting a load balancer in front of your servers, such as HAProxy or nginx, you can round-robin your requests and spread the workload around to your many Repustate Servers.
To accommodate this sort of architecture, Repustate Servers come built-in with a feature called Repustate Sync. Repustate will periodicially poll your specified endpoints and update the rules on each server with the results the endpoint returns. This allows you to add as many nodes as you want so long that they can reach the endpoint you define.
To enable Repustate Sync, you must create a file called `sync` and put it in the `config` directory. The contents of the sync file are as follows:
sync_interval = 24h sentiment_server = "http://example.com/sentiment-rules" filter_server = "http://example.com/filters" entity_server = "http://example.com/entities"
For each endpoint, the response type is expected to be JSON and return HTTP 200. A status code other than 200 means the data will not be refreshed server side. An HTTP code of 304 means the content hasn't changed and Repustate won't do any updates. The following is the expected response format for each endpoint:
Endpoint type | Sample response |
---|---|
Sentiment |
{ "apikey":"xxxxx", "rules":[ { "lang":"en", "subaccount":"xxx", // optional "text":"my rule", "sentiment":"pos", "id":"myid" // optional } ] } |
Filters |
{ "apikey":"xxxx", "filters":[ { "subaccount":"xxx", // optional "label":"canadian cities", "rule":"Montreal OR Toronto OR Vancouver" } ] } |
Entities |
{ "apikey":"xxxx", "entities":[ { "subaccount":"xxx", // optional "lang":"en", // optional, default is 'en' "title":"Repustate", "classifications":[ "Org.software_company", "Org.saas_business" ], "aliases":[ "Repustate Inc", "R-State" ] } ] } |
There are a few tweaks you can do to your servers to optimize the performance of the Repustate server. Firstly, we suggest not running anything else on your server other than Repustate as depending your workload, Repustate might be very resource intensive, particularly with RAM. We also suggest making the following changes to your /etc/sysctl.conf:
After making these change, reload your sysctl.conf with sudo sysctl -p. This will allow your OS to reuse TCP connections quickly and not run out of file descriptors during heavier loads.
We also recommend bumping up the total number of open files allowed by editing /etc/security/limits.conf and adding two entries:
The * refers to which users should have their open file limit increased. If you want to restrict to just one user, replace the * with the relevant username