The spotlight is currently on ChatGPT, Stable Diffusion, and DreamStudio–Generative AI, showcasing impressive and continually advancing results. Intelligent assistants are reshaping how we approach tasks, from searching and information analysis to coding, network security, and content creation.
Gen AI is poised to become a fundamental component in how enterprises handle and deliver IT services, transforming how business users accomplish their tasks. While the potential is vast, so are the associated challenges. Developing and deploying effective AI solutions can be a costly venture with a substantial risk of failure. Additionally, the computational demands of Gen AI, powered by large language models (LLMs), translate into significant electricity consumption.
Estimates suggest that training an LLM with over 175 billion parameters consumes a year’s worth of energy for 1,000 US households. Answering 100 million+ generative AI questions daily can result in electricity consumption equivalent to the daily energy use of 33,000 US households, as calculated by Dr. Sajjad Moazeni from the University of Washington.
The question arises: How can CIOs deliver accurate and trustworthy AI without incurring the exorbitant energy costs and carbon footprint associated with such computational demands?
Here are six practical tips for deploying Generative AI with reduced risk and cost-effectiveness:
- Foundation Models:
- Begin with established foundation models like OpenAI’s ChatGPT or open-source models such as Meta’s Llama 2.
- Leverage active developer ecosystems and a diverse application portfolio.
- Match Model to Application:
- Choose models based on their suitability for specific applications, considering factors like general-purpose capabilities or specialized focus.
- Retrain for Efficiency:
- Retrain foundation models on specific datasets to enhance accuracy and create smaller, more efficient models.
- Optimize models for targeted applications, shedding unnecessary parameters.
- Utilize Existing Infrastructure:
- Avoid the need for massive GPU arrays; leverage existing infrastructure for practical AI training, retraining, and inference.
- Modern CPUs with AI acceleration can handle training and retraining effectively.
- Hardware-Aware Inference:
- Optimize and tune inference applications for specific hardware types and features.
- Balance accuracy with model size and processing efficiency.
- Cloud Cost Management:
- Be mindful of cloud spend when providing AI services through cloud-based APIs.
- Consider on-premises and co-located infrastructure for workloads that can perform well outside the cloud.
In the realm of Gen AI, options abound. Despite the perception of generative AI as an exclusive technology for well-funded organizations, there exists a multitude of high-performance models, including LLMs, that operate efficiently on standard CPU-based data centers or cloud instances. CIOs can strategically leverage these options to implement transformative AI solutions without shouldering the risks and costs of a fully independent development approach.
DELTA Data Protection & Compliance Academy & Consulting – firstname.lastname@example.org