FAQ: AC 5.X Professional AI Services Deployment

How do I create a Resource Group in Azure?
- Follow these steps:
  - Create a "Resource Group" with the necessary tags.
  - Select "Basics" and provide project and resource details.
  - Mention name, value, and resource for the tags.
What is the App Service Plan and how is it created?
- The App Service Plan is used to host applications. To create it:
  - Mention the name, operating system, and region.
  - Provide necessary tags (name, value, and resource).
What steps are involved in developing a Web App?
- The following steps should be followed:
  - Specify the application name and publish type.
  - Choose the runtime stack and Java web server stack.
  - Define OS requirements and select a deployment region.
  - Choose the pricing plan recommended by the product team.
  - Provide GitHub account reference for image retrieval.
  - Configure networking options (public access should be set to "ON").
  - Enable monitoring with Application Insight.
How do I manage APIs within AC 5.X?
- Create an "API Management Service" and follow these steps:
  - Add instance details and select the pricing tier.
  - Configure APIs by adding display name, description, web service URL, URL scheme, and API URL suffix.
  - Add and update policies as needed, and develop GET and PUT operations.
How do I create an AKS (Azure Kubernetes Service) Cluster?
- To create an AKS Cluster:
  - Go to "Basics" and add cluster details.
  - Configure and deploy the cluster.
What is Milvus and how is it configured?
- Milvus is an open-source vector database designed for similarity search and AI applications. The document outlines its configuration, though specifics beyond its mention are not detailed.
What are the best practices for monitoring web applications?
- Use the "Monitoring" option within Application Insight to access relevant insights and data for better monitoring of web applications.
What are the key considerations for networking during deployment?
- Ensure that public access is enabled ("ON") and disable network injection by setting it to "OFF" during configuration.

Technical Questions: AC 5.X Professional AI Services Deployment

Resource Groups and Tags:
- Q: What are the critical tags required when creating a Resource Group in Azure for AI services deployment, and why are they necessary?
- A: Tags like name, value, and resource are critical. They help in organizing, tracking, and managing resources efficiently across the deployment. Tags also facilitate cost management and reporting by enabling better categorization of resources.
- Q: How do resource tags impact the management and organization of deployed assets?
- A: Tags allow administrators to filter and group resources, simplifying asset management across environments (development, testing, production). They also aid in automating tasks like billing and monitoring.
App Service Plan Configuration:
- Q: How do you determine the appropriate operating system and region when configuring an App Service Plan for AC 5.X?
- A: The operating system depends on the stack requirements (e.g., Linux for Java-based stacks), while the region is selected based on latency, compliance with data sovereignty laws, and proximity to the user base for optimal performance.
- Q: What factors influence the selection of a pricing plan for an App Service Plan, and how should they be evaluated based on project requirements?
- A: The pricing plan depends on factors like the app’s expected traffic, performance needs, scaling requirements, and budget constraints. For AC 5.X, the product team’s recommendation is vital, balancing cost with scalability and high availability.
Web App Deployment:
- Q: What are the key differences between various runtime stacks available when deploying a web app, and how does the Java Web Server Stack influence the app's performance?
- A: Different runtime stacks like .NET, Python, or Java offer various optimizations for language-specific applications. The Java Web Server Stack is optimized for Java apps, supporting features like servlet management and JDBC integration, which can improve load handling and database interactions.
- Q: How do you configure public access while ensuring network security for a deployed Web App?
- A: Set public access to "ON" for user access but implement firewall rules, secure authentication (OAuth), and SSL encryption to ensure that unauthorized access is blocked and data is transmitted securely.
- Q: What considerations should be made when linking a Web App to a GitHub repository for image retrieval?
- A: Ensure the repository contains the correct version of the app's image, and that access permissions are properly configured. Continuous deployment pipelines should be set up to automate the deployment of new changes, ensuring version control and rollback capabilities.
API Management and Configuration:
- Q: What is the role of API Management Service in the context of AC 5.X, and how does it contribute to maintaining a scalable infrastructure?
- A: The API Management Service facilitates the secure, scalable exposure of backend services via APIs. It allows rate limiting, authentication, and monitoring, ensuring that APIs can handle high traffic and adhere to security policies while integrating with various client apps.
- Q: How do you configure APIs to handle communication between different systems, and what are the best practices for defining API URL schemes and suffixes?
- A: APIs are configured by specifying their base URL, adding an appropriate suffix (e.g., /v1/ for versioning), and setting up endpoint operations like GET and PUT. Best practices include using clear, descriptive names for URLs, adding proper documentation, and using HTTPS for secure communications.
API Policies and Operations:
- Q: How are HTTP(s) endpoints configured within an API Management Service, and how do policy selection and management affect the API's behavior?
- A: HTTP(s) endpoints are set by providing a URL and selecting policies such as rate limits, IP filtering, or caching. Policies directly impact performance, security, and scalability by controlling how requests are processed and ensuring compliance with business rules.
- Q: What is the difference between GET and PUT operations in the context of AC 5.X, and how are these used in managing system interactions?
- A: GET is used for retrieving data from the system, while PUT is used for updating existing data. In AC 5.X, GET operations fetch information (e.g., system status), and PUT operations update configurations or submit new data to backend services.
AKS (Azure Kubernetes Service) Cluster Deployment:
- Q: What cluster details are critical during the creation of an AKS Cluster for AI Services, and how do these settings impact system scalability?
- A: Critical details include cluster name, region, node size, and count. These settings influence how well the cluster can scale with demand. Larger node sizes and regions closer to users reduce latency and improve performance under high loads.
- Q: How does the integration of AKS clusters enhance the deployment and orchestration of AI services in production environments?
- A: AKS clusters enable automatic scaling, load balancing, and orchestration of AI workloads, ensuring that services can handle fluctuating traffic while maintaining uptime. They simplify the management of containerized applications in production.
Milvus Configuration:
- Q: What are the key steps involved in configuring Milvus for vector similarity search, and how does this configuration integrate with AC 5.X for AI-driven applications?
- A: The key steps include defining the data structure, selecting the indexing method (e.g., IVF or HNSW), and optimizing query parameters for efficient similarity searches. In AC 5.X, Milvus integrates to provide fast, scalable AI services by indexing large datasets for real-time search and retrieval.
- Q: How does the performance of Milvus affect data retrieval and processing times in AI applications?
- A: The performance of Milvus, particularly in indexing and querying, directly impacts the speed of data retrieval. Efficient indexing reduces search times for AI models and improves the responsiveness of applications handling large-scale similarity searches.
Application Insight for Monitoring:
- Q: How can Application Insight be used to monitor deployed web applications, and what specific metrics should be tracked for ensuring optimal performance?
- A: Application Insight can track metrics like request response times, failed requests, CPU and memory usage, and dependency call failures. Tracking these ensures that bottlenecks are identified, errors are minimized, and performance is optimized.
- Q: How do you configure alerting and diagnostic settings in Application Insight to preemptively detect and resolve issues?
- A: Configure alert thresholds for critical metrics (e.g., response time exceeding a limit), set up log-based diagnostics to capture detailed error reports, and enable email or SMS notifications to ensure timely intervention when issues arise.
Networking and Security Considerations:
- Q: What are the security implications of setting public access to "ON" for web applications, and how can you mitigate risks without disabling public access?
- A: Public access allows anyone to reach the application, increasing security risks. To mitigate, use secure authentication (e.g., OAuth), IP whitelisting, and enforce SSL/TLS for encrypted communication.
- Q: How does disabling network injection ("OFF") affect the deployment and performance of AI services in AC 5.X?
- A: Disabling network injection limits internal network access and isolates the service from unnecessary network traffic, improving security and stability. It may slightly affect performance if some services rely on network resources but generally improves resilience.
Scaling and Performance:
- Q: How can you optimize the deployment of AC 5.X AI services to ensure they scale effectively in a production environment?
- A: Use auto-scaling for the AKS cluster, implement load balancing for web apps, and monitor resource utilization in Application Insight to trigger scaling actions based on demand. Pre-emptively scaling based on traffic patterns also ensures smooth performance.
- Q: What are the key performance metrics to monitor for ensuring that deployed AI services are meeting SLA requirements?
- A: Monitor request response times, error rates, CPU and memory usage, and uptime. Ensuring low latency, minimal downtime, and efficient resource utilization will help maintain SLAs.