Skip to content
FacebookYouTubeX (Twitter)

Choosing Hardware

The minimum recommended setup for hosting Pingstreams involves two separate servers:

  • Server 1: To host the Pingstreams Server, Dashboard, Widget, and Web Chat.
  • Server 2: Dedicated to MongoDB.

This basic infrastructure allows the servers to be up and running but does not account for advanced quality parameters such as performance, high availability, backups, redundancy, and disaster recovery.

Instance Type Recommendation: The default instance type recommended is an AWS EC2 t2.small (or equivalent), which is sufficient for testing environments. For production environments, a c4.large instance is ideal, providing better performance and stability.

Alternative with Heroku: If you prefer using Heroku, we recommend at least the Hobby dyno type for small applications, although Pingstreams can also run on the Free type for development and testing.

For production environments with higher traffic and reliability requirements, consider a more robust server configuration:

  • Load Balancer: To manage traffic and increase availability by distributing the load across multiple instances.
  • Auto-scaling: To dynamically adjust server capacity based on traffic.
  • Backup and Disaster Recovery: Enable regular automated backups and define a disaster recovery plan.

To serve Pingstreams’s front-end components (Dashboard, Widget, and Web Chat), we suggest using AWS S3 + CloudFront. This setup enables efficient and scalable distribution of static content.

Enable Gzip Compression on CloudFront to reduce network traffic and improve loading performance. More info here: https://aws.amazon.com/blogs/aws/new-gzip-compression-support-for-amazon-cloudfront/

We recommend the following configurations for MongoDB:

  • M10: Suitable for testing or low-traffic applications. It supports continuous backups for data security.
  • M30 or higher: Recommended for high-traffic applications or production environments, providing better performance and scalability.
  • Replica Set with at least three nodes, each located in a separate Availability Zone to improve database reliability and availability. This setup reduces the risk of data loss and ensures greater service continuity.
  • Sharding (for high workloads): In cases of heavy database usage, consider implementing sharding to distribute the load and enhance performance.
  • Performance Monitoring: Use tools like MongoDB Compass or integrations with monitoring systems (e.g., CloudWatch) to track the database status and optimize resource allocation.

Other Backend Requirements (Third-Party Services)

Section titled “Other Backend Requirements (Third-Party Services)”

To run Pingstreams in an on-premise environment, the following third-party backend services are required:

  • RabbitMQ – Message broker used for internal communication between microservices.
  • Redis – Caching layer that improves system performance by storing temporary data in memory.

Important

These services must be installed and properly configured before proceeding with the Pingstreams installation.

  • For non-business-critical environments (e.g. testing or small-scale deployments), all services can be installed in a single-node setup on a single virtual machine or server.
  • For production-grade or business-critical environments, a high-availability (HA) setup with service replication and failover is strongly recommended to ensure reliability and uptime.

Optional: Hosting Large Language Models (LLMs)

Section titled “Optional: Hosting Large Language Models (LLMs)”

If you plan to host a Large Language Model (LLM) such as LLaMA 3, which requires substantial computational resources, here are additional hardware specifications and recommendations:

  • Architecture: NVIDIA Ampere or newer (e.g., A100, A6000, H100).
  • GPU Memory: At least 40 GB of HBM2 or higher (for larger models).
  • CUDA Cores: At least 6,912 cores to handle intensive AI workloads.
  • NVLink: If available, use NVLink to connect multiple GPUs, increasing communication bandwidth between GPUs and improving performance for larger models.
  • Multi-core CPU: Minimum 16 cores (32 threads) to support distributed workloads. Preferably a recent AMD EPYC or Intel Xeon CPU.
  • Clock Speed: At least 2.5 GHz per core.
  • RAM Amount: Minimum 128 GB DDR4/DDR5. For particularly large models, consider 256 GB or more, especially if the model will handle multiple requests simultaneously.
  • Storage Type: NVMe SSD for faster data access and improved I/O performance.
  • Storage Capacity: Minimum 1 TB, with scalability based on the model size and any auxiliary data. For long-term projects, consider a distributed storage system.
  • Cooling and Power: High-performance GPUs like the A100 or H100 require adequate cooling and stable power supply. Ensure that the cooling infrastructure is sufficient, especially for on-premises setups.
  • Cluster Configurations: For large-scale applications, consider using a GPU cluster, for example, via a Kubernetes system with GPU support. This allows for elastic scaling based on workload and provides redundancy.
  • Networking: For clusters, consider using a high-speed network (e.g., InfiniBand) to minimize latency between nodes and enhance overall performance.
  • Containers and Virtualization: Use Docker or other containers for rapid deployment and to ensure portability of the model across different platforms.
  • Resource Management and Monitoring: Use tools like NVIDIA’s GPU Cloud (NGC) to monitor and manage resources, optimizing GPU utilization and maintaining system stability.

Pingstreams can be deployed in private or public cloud environments, utilizing dedicated hardware configurations provided by cloud providers such as AWS, Azure, or Google Cloud, with support for high-level GPUs like NVIDIA A100 and V100. This approach offers scalability and simplified management without the need for maintaining physical hardware.

This guide provides a solid and scalable setup for running Pingstreams in both development and production environments, with optional infrastructure suggestions for hosting Large Language Models if needed, allowing your infrastructure to grow based on traffic demands.