Thursday, October 21, 2021

Achieve higher levels of cloud maturity with socially responsible IT


Photo by Joey Kyber from Pexels

What is the highest level of cloud maturity in an organization? In my opinion, it is delivering capabilities that are environmentally & socially responsible.

As a visionary, we at Google can help you develop applications and services that are not just cool but also economically and socially responsible. Google has been carbon neutral since 2007 and has committed to operating on 100% carbon-free energy (CFE) 24x7 by 2030. Google data centers are architected to use much less energy than a typical data center - read more about it here.

If you missed the Next ‘21 session on ways to reduce your carbon footprint, you can still catch up on it here.

Just by running your workloads on Google Cloud, you can have a positive impact on the planet. However, it doesn’t have to stop there. Google Cloud offers several metrics, tools and dashboards to help you understand and reduce your carbon footprint including the ability to adhere to GHG Scope 3 standard reporting requirements.

Here are 3 strategies that you can employ to reduce your carbon footprint;

1. Choose the most eco-friendly cloud region

Google publishes carbon data for all its cloud regions and you can see the CFE% and the local electricity grid’s carbon intensity. This will help you pick cleaner cloud regions (if you can) to run your workloads.

  • CFE % - This metric is updated hourly and it represents the average percentage of carbon-free energy consumed in that particular region. In simple terms, you can think of this as the percentage of time your application would run on carbon-free energy. Higher the % the better it is.

  • Grid Carbon Intensity - This metric indicates the gross carbon emissions from the grid per unit of energy. In layman terms, the lower this value, the better.

On the Google Cloud console, some cloud regions may have the Low CO2 marker against them. This means that the region has a Google CFE% of at least 75% and/or a grid carbon intensity of 200 gCO2eq/kWh or less.

Using a combination of the CFE% and the grid carbon intensity indicators, you now have the choice to run your workloads that are eco-friendly.

2. Leverage fully managed services where possible

Fully managed offerings inherently are typically more efficient than manually operated ones. For instance, consider the following scenarios;

  • Running an application on Google Compute Engine (GCE) with auto scaling capabilities is more efficient than running it on a static size server farm on-premise

  • An application that can be containerized (GKE) can offer higher server density resulting in reduced resource usage

  • On the far end of the spectrum, if your workloads can leverage fully managed serverless offerings such as Cloud Run (for containerized workloads) / Cloud Functions (based on FaaS) / BigQuery (serverless data warehouse), these can provide greater efficiencies not just with economies of scale but also can reduce idle resources through smart resource allocation and scaling based on demand.

This may tie directly to your organization’s cloud maturity level and adoption. In a hybrid environment you may still have some monoliths that warrant the use of larger machines/VMs but still having this understanding will help you with your longer term “socially responsible” goals.

3. Optimize resource needs and utilization

This may sound like a cliche. 

But, just having the ability to track, monitor and report on CFE metrics is a great starting point. Solutions like Active Assist can offer insights and recommendations to optimize your cloud usage. Here are some things to ponder upon;

  • Carrying on-premise mindset to cloud - Over-provisioning of VMs / server resources result not just in increased costs, but worse, negative CFE scores

  • Forgot to close the faucet - Unused, running VMs are just bad practice. Having a governance model in place would go a long way in reducing emissions. For VMs that are only needed at certain times try leveraging Scheduled VMs option. The Idle VM recommendation option can be handy to identify and turn down idle VM resources.

  • Try refactoring your monolith applications into microservices. Microservices are nimble, scale independently and can offer higher efficiencies and lower TCO - in stark contrast to monoliths with a significantly larger carbon footprint

  • Lifting & Shifting workloads? This is not a bad option especially if you are just embarking on your cloud journey. Oftentimes, this may be the only option. However, think of creative ways to optimize it for the cloud. Can you separate the analytics capabilities? Containerize peripheral applications / services?

  • Consider running your standalone batch workloads in a region with higher CFE%

  • Preemptible VMs can be a great way to save on operational costs plus reduce your carbon footprint - if you have long running yet stateless workloads that can take advantage of.

Ready to embark on your socially responsible app development?

What is your level of cloud maturity?


Friday, December 25, 2020

Build your own COVID-19 Data Analytics Leveraging Open Source - Serverless Fn & Autonomous Database

Firstly, a huge thanks to all the frontline workers, healthcare professionals, technology contributions, non-profit organizations and everyone else who are fighting the pandemic everyday amidst the risks involved and more importantly the sacrifices - including the most important thing that can never be retrieved - TIME.

As Oracle Cloud continues to aid COVID-19 vaccine research & trials and also help manage the COVID-19 vaccine distribution program, I came up with a simple (humble) contribution that helps analyze and report COVID-19 data.

Although there are several dashboards and reports readily available on the web, they cater to a specific region, locale & metric. For example, what if? one would like to understand the total COVID-19 cases reported in Belarus on/up-to a specific date sliced by total tests conducted, new cases that were tested positive, hospitalization ratio per million against a median age group?

Data is publicly available but making sense of it is the key. In this article, we will see how we can build our own COVID-19 analytics - leveraging simple & popular open source tools and technology with the help of public cloud (Eg. Autonomous Database in Oracle Cloud - that is auto-managed, auto-tuned & auto-patched) [PS: You can also replace Autonomous DB with MySQL]. What's more? - we can potentially achieve this for free with the Always Free cloud instance.

Let's quickly take a look at the high-level solution architecture;


Each component of this architecture is designed to be loosely coupled and can be replaced or further enhanced for functionality/convenience. For instance, I have leveraged Apache open-source serverless Fn project that natively runs on Oracle Cloud - this can be replaced with a Node.js or java code running on K8s-Docker container. Similarly, the Autonomous Database can be replaced with a MySQL DB 

Let's take a look at the key components of this solution;

1) Source dataset [Courtesy: Our World in Data]

"Our World in Data" offers COVID-19 data in a variety of data formats and most importantly offers daily updates - so we can keep our analytics up-to-date.

In our case, we get the dataset in csv format here

2) Python script deployed on Oracle Fn

I have leveraged the Oracle Cloud Functions (based on Apache Fn) serverless platform to deploy the simple python script to download the COVID-19 dataset into an object storage bucket.

The choice of Oracle Cloud Functions is particularly helpful in this case because I don't have to manage any infrastructure or deal with packaging the docker container and version control them. It lets me focus only on the business logic. Also, it supports a variety of programming languages natively including Python, Go & Java. Most importantly, it has built-in security and offers out of the box support for event notifications & triggers and ability to expose functions as APIs.

Pre-Req: 

Create a dynamic group for Oracle Functions and ensure you have a policy defined in your compartment/tenancy for Oracle Functions the ability to access / read csv in the object storage bucket

Instances that meet the criteria defined by any of these rules will be included in the dynamic group.

ALL {resource.type = 'fnfunc', resource.compartment.id = 'ocid1.compartment.oc1..abcdefgxyz'}

Allow dynamic-group FnDynamicGroup to manage objects in compartment sathya.ag

Let's create an Oracle Functions application;

Oracle Cloud -> Developer Services -> Functions -> Create Application & give it a name

Applications lets us group several Oracle Functions. To create a serverless function;

For quick setup, you can leverage the Cloud Shell under the Getting Started instructions to deploy the following python code. Oracle functions platform packages the code as a docker container, uploads the docker image to the default Oracle Cloud docker registry and automatically deploys it as a serverless function with an invoke endpoint.

import io
import json
import oci
import csv
import requests
import logging
import os
import urllib.request

from fdk import response

def progress_callback(bytes_uploaded):
    print("{} additional bytes uploaded".format(bytes_uploaded))

def handler(ctx, data: io.BytesIO=None):
    logging.getLogger().info("Got incoming request")
    signer = oci.auth.signers.get_resource_principals_signer()
    object_name = bucket_name = namespace = ""
    try:
        cfg = ctx.Config()
        input_bucket = cfg["input-bucket"]
        processed_bucket = cfg["processed-bucket"]
        input_csv = cfg["input-csv"]
        object_name = cfg["object-name"]
    except Exception as e:
        print('Missing function parameters: bucket_name', flush=True)
        raise
    logging.getLogger().info("before calling load data {0} {1} {2}".format(input_bucket,
 input_csv, object_name))
    
    
    logging.getLogger().info("download start!")
    filename, headers = urllib.request.urlretrieve(input_csv, filename="/tmp/covid.csv")
    logging.getLogger().info("download complete!")
    
    load_data(signer, namespace, input_bucket, filename, object_name)
    #move_object(signer, namespace, input_bucket, processed_bucket, object_name)

    return response.Response(
        ctx, 
        response_data=json.dumps({"status": "Success"}),
        headers={"Content-Type": "application/json"}
    )
    
def load_data(signer, namespace, bucket_name, input_csv, object_name):
    logging.getLogger().info("inside load data function {0} {1} {2}".format(signer, 
namespace, bucket_name))
    client = oci.object_storage.ObjectStorageClient(config={}, signer=signer)
    try:
        print("INFO - About to read object {0} from local folder and upload to bucket {1}
...".format(object_name, bucket_name), flush=True)
        namespace = client.get_namespace().data
        
        # Use UploadManager to do multi-part upload of file, with 3 Parallel uploads
        logging.getLogger().info("before calling uploadmanager")
        upload_manager = oci.object_storage.UploadManager(client, 
allow_parallel_uploads=True, parallel_process_count=3)
        response = upload_manager.upload_file(namespace, bucket_name, object_name, 
input_csv, progress_callback=progress_callback)
        logging.getLogger().info("response status {0}".format(response.status))
        if (response.status == 200):
            
            message = "Successfully  uploaded %s in bucket %s." % (object_name, 
bucket_name)
            return True

    except Exception as e:
        logging.getLogger().info("exception message {0}".format(e))
        message = " Image upload Failed  in bucket %s. " % bucket_name
        if "oci" in e.__class__.__module__:
            if hasattr(e, 'message'):
                message = message + e.message
            else:
                message = message + repr(e)
                print(message)
        return False

In the configuration section, provide the key value pairs that can be dynamically processed by Oracle Functions at invoke.

3) COVID Dataset in Object Storage

Let's verify if the COVID-19 dataset is successfully downloaded into our object storage bucket.

4) Loading COVID-19 dataset into Autonomous Database

Since we are leveraging Oracle Cloud Autonomous DB, we can leverage the OOTB cloud SQL package to load data from an external object storage bucket. Another variant of this approach could be to leverage the csv data in object storage as an external table and ADB lets you query on the csv data directly.

Again, the choice for Autonomous Database helps us focus only on loading and querying the dataset and not have to worry about the underlying infrastructure, patching & maintenance. I like to think of Autonomous DB as a "serverless" database.

Execute the DBMS_CLOUD.COPY_DATA procedure to load data from object storage into ADB. [Ensure you have the base table created to hold the COVID-19 dataset]

BEGIN
 DBMS_CLOUD.COPY_DATA(
    table_name =>'COVID',
    credential_name =>'JSON_CRED_AG',
    file_uri_list =>'YOUR OBJECT STORAGE BUCKET URI',
    format => json_object('type' value 'CSV', 'ignoremissingcolumns' value 'true', 
'delimiter' value ',', 'blankasnull' value 'true', 'skipheaders' value '1', 'dateformat' value 'YYYY-MM-DD')
 );
END;
/

5) Analytics Dashboard

Now that we have our data in a database, we can leverage any analytics / reporting engine to make sense of the data and generate dynamic reports. In our case, we leverage Oracle Analytics and/or Oracle Data Visualization.



Tuesday, September 29, 2020

Mitigate Ransomware Attacks & Protect your data with Oracle Cloud

Recently, I was working with a Fortune100 retailer. During a cadence with their Chief Technology Officer & Security Advisor, an interesting topic came up for discussion. With ever growing malware attacks - especially Ransomware, the board mandated IT to prioritize strategy to mitigate, prevent & protect their crown jewel (data) against potential Ransomware attacks.

Board concerns included;

  • Protecting Brand Reputation
  • Immediate need for a cost-effective business continuity plan (BCP)
  • Security Compliance

Enterprises across the world - both large & small - have been impacted by Ransomware and incurred several billion dollars in losses - either through loss of business, time to recover and/or ransom costs.

Per wikipedia...

Security experts have suggested precautionary measures for dealing with ransomware. Using software or other security policies to block known payloads from launching will help to prevent infection, but will not protect against all attacks. As such, having a proper backup solution is a critical component to defending against ransomware. Note that, because many ransomware attackers will not only encrypt the victim's live machine but it will also attempt to delete any hot backups stored locally or on accessible over the network on a NAS, it's also critical to maintain "offline" backups of data stored in locations inaccessible from any potentially infected computer, such as external storage drives or devices that do not have any access to any network (including the Internet), prevents them from being accessed by the ransomware.

As hackers find new & creative ways to disrupt global businesses with malicious intent - Reveton, Fusob, WannaCry, BadRabbit, Petya (Remember NotPetya?), SamSam - all different strains of Ransomware over the years that have caused billions in losses, it might sound impossible to predict but certainly possible to prevent, protect & mitigate the impact & damage; should there ever be one.

In this blog, I would like to share my perspective and solution on how we helped the customer by leveraging Oracle's Gen2 Cloud Infrastructure services.

One of the core tenets of security to prevent against Ransomware like malware attacks is to maintain consistent, redundant, secure "offline" backups of critical data - since Ransomware can traverse network.

Our proposal encompassed 3 primary factors that are key for enterprise workloads to run uninterrupted;

1. Enterprise Grade Secure Backups & Cloud Storage

Oracle's Gen2 Cloud offers secure, redundant & enterprise grade cloud backup & storage solution aimed at not just backing up on-premise data (offline backups) but also services that manage & automate consistent on-premise data backups. Specifically the following built-in features offer an immutable, versioned, consistent, redundant & secure storage for all kinds of enterprise data.

  • Two distinct storage tiers for hot & cold backup storage
  • Secure & Restricted access with fine-grained IAM policies
  • Object versioning to prevent accidental/malicious object overwrites/deletion (CRUD)
  • Default AES-256 bit encryption with ability to auto/self managed keys
  • Rich lifecycle automation policies
  • Retention rules to comply with regulatory compliance and ensure data immutability
  • Configurable Replication policies for data redundancy cross-region
  • Self-healing to ensure data integrity

In additions, 

Oracle Storage Gateway offers the ability to deploy the solution with zero disruption as it exposes cloud storage as an NFS locally &

Oracle database backup service automates the management of Oracle database backups from on-premise to cloud

2. Ensure Business Continuity - Not just offline backups for fallback

Oracle cloud Gen2 prides itself on the fact that it is purpose built for the enterprise. With fundamental building blocks at its core such as "off-box virtualization", non-oversubscribed everything (network, BW, compute & storage), defense-in-depth layered security-first cloud architecture & unique offerings such as modern AMD, Intel, Nvidia GPUs, HPC, RDMA clustered networking, NVMe & Exadata, customers can rely on Oracle Cloud and treat it just as an extension of their on-premise IT.

This provides the ability to spin up VMs, Bare Metal servers, VMWare workloads, Databases (Oracle DB VMs, Physical DBs, MySQL, Exadata, Autonomous, SQL Server) - everything potentially needed to ensure business continuity.

3. Security-First Cloud Architecture & Compliance

At its core, Oracle Cloud offers built-in;

  • Edge-Security through Global PoPs, DDoS protection, DNS security & WAF
  • Monitoring with 3rd party security (FW, NGFW, IPS), configuration monitoring, logging & compliance
  • Virtual Network interface segmentation, Security Lists, IPSec VPN, FastConnect & Private Network
  • Tenant isolation, Hardened Images, HW Entropy, Root-of-Trust Card, HSM & signed firmware
  • Data (At-Rest, In-Transit & Key Vault Management)
  • Identity federation, role-based policies, compartments, tagging and instance principals

In additions, 

Fine-grained IAM security policies to secure & restrict resource access at the finest level,

Multi-Factor Authentication (MFA) for additional layer of user security

CASB for OCI offers visibility, threat protection, data security and compliance for OCI deployments.

Below is the reference architecture that addresses Ransomware prevention & mitigation strategy for deployments & data in the Oracle Cloud.

Feel free to reach out if you have a criticism, feedback or queries.

Monday, September 23, 2019

Automate Oracle APEX Deployment for OCI Database Using Terraform

Recently, I was assisting a customer design & configure APEX (Oracle Application Express) for databases on OCI. The purpose of this blog is to augment some finer details, tips & tricks that may help with successful installation (already documented in some detail in this whitepaper.)

Let's quickly look at the architecture / deployment topology. In this example, we will have one centralized APEX instance mapped to multiple database instances. The APEX instance will be provisioned in a public subnet so it can be accessed from the internet. However, the OCI databases will be secured within one or more private subnets.

Step 1: Download Terraform release 0.11.15 (oci) here



Step 2: Depending on your OS, install/configure Terraform on your laptop/PC. For example, if you are running on a mac, download the darwin_amd64 and unzip this in a folder.

Step 3: Copy the "terraform" to your /usr/local/bin directory to install/configure Terraform on your machine

$ unzip terraform_0.11.15-oci_darwin_amd64.zip
$ cp terraform /usr/local/bin

Step 4: Download the APEX terraform template & scripts here. This contains the terraform templates & scripts to install and configure ORDS & APEX on a OCI Compute VM within a public subnet.

Step 5: Create a bucket within your OCI object storage. Download the following and place them within this bucket. The Terraform script will use these to install the appropriate versions of APEX, ORDS & web server.

Hint: You can make this bucket public briefly if you don't want to bother with pre-auth requests etc.. since this will only hold the binaries. Once done, we can change the visibility back to private or blow this bucket up.

a) Download the latest APEX binary here
b) Download the latest ORDS binary here
c) Optionally (if you prefer running APEX on Tomcat) download the latest tomcat zip

Note: Make sure the Apex and ORDS versions are compatible with the version of database you provisioned on OCI.

Step 6: On your OCI tenancy, make sure you have a public subnet within your VCN, attached a internet gateway and a security list. The security list must have the following ingress rules;

0.0.0.0/0           TCP          TF_VAR_COM_PORT (Fetch this based on the port on which you would expose APEX over.)

Step 7: Open the ingress rule on the private subnet (where your database is running) to allow the port (eg., 1521) from the public subnet (where APEX will be deployed)

Step 8: For simplicity, we will be exposing APEX over ip-based access. If you prefer to frontend your apex installation with a DNS, follow the whitepaper reference above. Does a great job offering DNS options.

Step 9: Unzip the ORDS zip file downloaded in Step 4 above.

Step 10: Run terraform --version and make sure you are running on 0.11.15 version. This is important as the terraform template & scripts are written based on this version. You may encounter errors trying to run this AS-IS with later versions of terraform.

Step 11: Gather the following info before proceeding. You will need access to your OCI tenancy to gather lot of details.

a) Generate a pair of SSH public & private keys. These will be used while provisioning the new compute VM that will host APEX & ORDS. Save these for future ssh access to your apex/ords compute VM. Gather the absolute paths for both private & public keys.
b) Generate a API key and fingerprint for your user id (Remember, this is the user that will be used by terraform OCI provider to make API calls into OCI). If you don't know how to generate a API signing key and fingerprint refer to OCI documentation here.
c) OCI Tenancy OCID
d) OCI User ID OCID
e) OCI User Fingerprint
f) OCI Compartment OCID
g) Target Database private IP address
h) Target Database Service Name (If you are running on a multitenant database, make sure you provide the PDB service name and not the root CDB service name).

Hint: Click on DB Connection button on the OCI DB console. It shows the CDB root connection info along with the service name info. Simply replace the CDB domain name with the PDB name. This should look like <<pdb>>.<<subnet>>.<<vcn>>.oraclevcn.com





i) Region: This is the region identifier where you have the database running & eventually APEX. Get your region identifier here.
j) AD: Availability Domain where you would like to install APEX/ORDS. eg., 1, 2 or 3
k) OEL Version: APEX Compute VM will be provisioned on OEL OS. Indicate which version of OEL you would like installed. eg., 7.6
l) Compute Instance Name & Display Name (Choose an appropriate name)
m) Instance Shape: eg., VM.Standard2.2 for a 2 OCPU VM
n) Object Storage URLs for APEX zip, ORDS war file & optionally tomcat zip files
o) Apex/ORDS webserver port number eg., 8080, 8888 etc..
Hint: Remember to open this port on the public subnet security list

Step 12: cd into the ORDS-APEX_Comp directory and run setup.sh. This script will prompt for all the values described above. Once this script is executed, this creates a env-vars file that contains these values for Terraform to use. Alternatively, you can directly edit this file and provide values.

Step 13: Once you make sure all variables are set properly, execute the following;

Hint: You may be prompted for the DB admin password.

terraform init

terraform plan (This should indicate 5 actions that will be performed on your OCI tenancy). This would change depending on whether you choose Tomcat or Jetty as your web server.

+ null_resource.remote-exec_init
      id:                                       <computed>

  + null_resource.remote-exec_tomcat-1
      id:                                       <computed>

  + null_resource.remote-exec_tomcat-2
      id:                                       <computed>

  + null_resource.remote-exec_tomcat-apex
      id:                                       <computed>

  + oci_core_instance.ORDS-Comp-Instance
      id:                                       <computed>

terraform apply

This should run for approximately 15-20 mins. Make sure there are no errors while you run this script. If there are errors, execute "terraform destroy" to rollback all changes. Also if APEX/ORDS is partially deployed on the database (by terraform), clean this up manually. Follow the documentation here.

Some common causes of errors;
1) You may have installed APEX manually on this database before. This will conflict with the terraform APEX install attempt.
2) Terraform script partially executed and failed during APEX install. (Hint: One way to find this out is by observing a bunch of PL/SQL ORA-* errors during terraform execution)

Solution is to clean up APEX manually and ensuring terraform is rolled back using the "destroy" command.

To add / map multiple databases to this APEX/ORDS instance, simply execute the apex_add_db.sh script bundled with the ORDS Terraform scripts.

$ ./apex_add_db.sh -p <database_admin_password> -i <IP_address> -s <database_service_name>

Hope this helps quickly spin up APEX for multiple DB instances.

Tuesday, June 4, 2019

Achieve High Availability (HA) with Oracle Cloud Infrastructure (OCI) - using HAProxy & KeepAlived

Features such as High Availability (HA), Elastic Scaling, Disaster Recovery (DR) are no longer restricted to Tier-1 mission critical applications & services. These are now table-stakes for any enterprise-grade cloud provider and enterprises can leverage these at will - as they are now fundamental building blocks of the cloud platform. Oracle Cloud Infrastructure (OCI) offers several modern state-of-the-art capabilities including - Bare Metal servers for extreme performance & HPC, Availability Domains & Fault-Domain constructs for fault-tolerance, HA & DR, Elastic Scaling of compute & storage independently for future-proofing, RAC for highly-available databases, Regional Subnets for seamless datacenter resiliency & failover, LBaaS (Load Balancer as a Service) for platform-driven fully managed instance failovers etc..

Although there are many different ways to achieve HA on Oracle Cloud Infrastructure, we will look at a more simple/primitive method (just leveraging core OCI capabilities) that leverages open-source technologies.
Note: This could be more elegantly achieved using OCI's native LBaaS PaaS service as well.

However, in certain situations - windows server based apps, container apps, microservice constructs - harproxy (or similar) & keepalived might have semblance to on-premise experience and/or solution preference.

In this article we would walk through detailed step-by-step instructions on how to install, configure and achieve HA & failover using HAProxy & KeepAlived.

This article presumes users have some exposure to HAProxy, KeepAlived concepts as well as cloud constructs such as virtual networking, subnets, private/public IPs, security lists/route tables etc..
A little bit of python & bash shell scripting knowledge would be helpful too.
I not, don't worry - I will try my best to point out / reference documentation as much as possible.

Pre-Requisites:

1) An Oracle Cloud Account / Tenancy. If you don't have one, you can request a trial instance here.
2) A Compartment that would host our HA instances
3) A VCN (Virtual Cloud Network) with at least 1 public subnet
4) Administrator access on the cloud instance to configure Identity, Network Rules, Policies & Dynamic Groups

Spin Up 2 OCI VM Instances:

To start, let's first spin up 2 VMs on OCI.

My VCN looks like below;

Need help setting up a VCN on OCI? Refer here

Note: In this example, we have HA within an AD (Availability Domain) leveraging the "Fault Domain" resilience. However, this can be quickly reconstructed with a "regional subnet" construct for a full "site resiliency".

Within a public subnet, spin up 2 VMs.
In my example I have 2 Angry Bird servers - Terence & Stella.
Both running a standard single core VM with Oracle Enterprise Linux in the same availability domain - but placed strategically in different fault domains for intra-AD HA.

Now, you should see 2 instances up & running within your VCN.

Let's now SSH into our instances.

ssh -i <<private_key>> opc@<<public_ip>>

Note: If you are unable to ssh into your instance. Make sure;
1) The instance is indeed spun up within a public subnet
2) Security List has port 22 enabled (by default this should be there)
3) Ensure you have an internet gateway (IGW) attached to your VCN and route table configured with the route to IGW
4) Finally at the OS level, make sure you open up firewall ports. For a quick test, you can try stopping the firewall on linux OS instances using the command: service firewalld stop

Install your preferred proxy service:

You can choose to install any of your preferred http/https proxy or load balancer service. Some of the most popular ones include - apache httpd, Nginx, HAProxy.

In my example, I used HAProxy.
Install HAProxy on both Terence & Stella VM instances.


sudo su
yum install haproxy

Since we are just going to test the failover / HA configuration, we are not going to actually create any backend sets / services. When the reverse proxy service is called, it will be directed to render a static error page.

Let's create a simple html page under /etc/haproxy/errorfiles/errorpage.http
Replace {ServerName} with appropriate VM names so it helps distinguish the service when it fails over.

HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<html>
  <head>
    <title>503 -  Service Unavailable</title>
  </head>
  <body>
    <div>
          <h2>Hello from {ServerName}</h2>
    </div>
  </body>
</html>

Configure HAProxy /etc/haproxy/haproxy.cfg with the errorfile info and frontend bind on port 80. Remember, we actually don't have any backends configured. But that's okay for our failover test.

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000
    errorfile 503 /etc/haproxy/errorfiles/errorpage.http
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend hatest
    mode http
    bind *:80
    default_backend             app

#---------------------------------------------------------------------
# static backend for serving up images, stylesheets and such
#---------------------------------------------------------------------
backend static
    balance     roundrobin
    server      static 127.0.0.1:4331 check

#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend app
    mode http
    balance     roundrobin
    option httpchk HEAD / HTTP/1.1
    server  app1 127.0.0.1:5001 check

Now, let's configure HAProxy to start on VM boot.
chkconfig haproxy on
service haproxy start

Try hitting both server instances with their respective public ip addresses;
http://publicipofinstance

and you should see the appropriate server's 503 error page with welcome messages.
Note:
If you are unable to connect from your browser, check the public subnet security list for port 80, route table for IGW route or the OS firewall could be blocking port 80.

Configure Secondary IP:

In order for instance to failover, we need a reserved public ip that would shuttle over across instances for a seamless HA / failover.

In the OCI Console, go to the primary VM instance - pick any one VM that would serve as the "master" node. Click on "Attached VNICs" under the "Resources" section.
We will now create a secondary IP under the "Primary VNIC".
Click on the "Primary VNIC" > "IP Addresses" > choose "Assign Private IP Address".

In the dialog, enter a private ip that is unused within the VCN/Subnet. In my case, I picked 10.0.0.4.


Under the "Public IP Address" section, choose "Reserved Public IP" and select "Create a New Reserved Public IP". Optionally give it a name.

This would be our reserved public ip - which would move along with our chosen private ip address.

Go back to the primary VNIC of your VM instance and notice you have 2 public IPs (one ephemeral IP that is OCI assigned and another reserved IP) assigned to the instance. This technically means the VM can be accessed with either IPs.


However, we need to make sure the OS config is updated to reflect this.

Quicker option is to execute the following command (however this would not persist on VM reboot).
In my case, the command looks like below;

ip addr add 10.0.0.4/25 dev ens3 label ens3:0
Syntax: ip addr add <address>/<subnet_prefix_len> dev <phys_dev> label <phys_dev>:<addr_seq_num>

To make this change persistent, create an ifcfg file named /etc/sysconfig/network-scripts/ifcfg-<phys_dev>:<addr_seq_num>. To continue with the preceding example, the file name would be /etc/sysconfig/network-scripts/ifcfg-ens3:0, and the contents would be:

DEVICE="ens3:0"
BOOTPROTO=static
IPADDR=10.0.0.4
NETMASK=255.255.255.128
ONBOOT=yes

Note: Only this step has to be performed on both VM instances - as when the private ip moves (along with the reserved ip) to standby instance, the OS must be able to recognize the IP mapping.

To verify this change, try accessing the Terence VM with both IP address via a browser.

Install KeepAlived:

We will leverage KeepAlived to maintain our server pool, monitor the VM instances and shuttle IP address. In our example we would use the VRRP protocol and unicast ip addressing.

Make sure to add the VRRP protocol rule to the subnet security list. This will allow the VM instances to communicate over VRRP.

Let's install keepalived on both VM instances.

sudo su
yum install keepalived


Modify the keepalived config file @ /etc/keepalived/keepalived.conf.
My config file for both Terence (Primary / Master) and Stella (Secondary / Backup) instances look like below;

Note source ip (ip of current server instance), peer ip (ip of backup instance) and state fields. Make sure priority is higher for Master node.

! Configuration File for keepalived

vrrp_script check_haproxy
{
    script "pidof haproxy"
    interval 5
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface ens3
    virtual_router_id 50
    priority 101
    unicast_src_ip 10.0.0.2

    unicast_peer
    {
        10.0.0.3
    }
    track_script
    {
        check_haproxy
    }

    notify_master /etc/keepalived/failover.sh
}

! Configuration File for keepalived
vrrp_script check_haproxy
{
    script "pidof haproxy"
    interval 5
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens3
    virtual_router_id 50
    priority 99
    unicast_src_ip 10.0.0.3

    unicast_peer
    {
        10.0.0.2
    }
    track_script
    {
        check_haproxy
    }

    notify_master /etc/keepalived/failover.sh
}

Configure Instance Principals in OCI:

We will leverage the OCI Instance Principals to allow instances within the server pool to manage virtual network connections. This would enable the reserved IP to move across VM instances.

Create a dynamic group and create a matching rule to ensure all VMs within our server pool are added to the group. More details on how to create a dynamic group here.
Now, create a policy to allow the dynamic group to manage virtual network connectivity.

In this case, the policy would look like below;

Allow dynamic-group HAProxyDG to manage virtual-network-family in compartment id ocid1.compartment.oc1..aaaaaaaaaxxxxxxxxxxxxxxxxa

Install Python OCI SDK:

Let's now install python oci sdk on both VM instances.

sudo su
yum install -y python


# Download and install pip
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py

# install python OCI SDK
pip install oci

Let's now start keepalived daemon services on both VMs. You can either use chkconfig to make this service start on boots. There have been several intermittent issues reported with running keepalived as a service.
Another workaround is to start this via command line using;
keepalived -D

Python Script using OCI SDK to Migrate IP from Master VM to Backup VM:

import sys, oci, logging, os

def assign_to_different_vnic(private_ip_id, vnic_id):   

    update_private_ip_details = oci.core.models.UpdatePrivateIpDetails(vnic_id=vnic_id)
    network.update_private_ip(private_ip_id, update_private_ip_details)
    
if __name__ == '__main__':
    signer = oci.auth.signers.InstancePrincipalsSecurityTokenSigner()
    network = oci.core.VirtualNetworkClient(config={}, signer=signer)

    new_vnic_id = sys.argv[1]
    privateip_id = sys.argv[2]
    assign_to_different_vnic(privateip_id, new_vnic_id)

We will now call this python script from a shell script.
Create this shell script under /etc/keepalived/failover.sh
Note: This script will be invoked by the keepalived daemon

#!/bin/bash

logger -s "Floating the private/public VIPs:"
python /home/opc/claimip.py {ocid of vnic} {ocid of private ip} > >(logger -s -t $(basename $0)) 2>&1
logger -s "Private/public VIPs attached to the NEW Master Node!"

Make sure in each of the VM instances, the vnic ocid is set properly. The private ip ocid will remain the same - since it will be the same ip (in our case 10.0.0.4) that will float across VMs.

We are all set. Access the reserved public ip (haproxy service) from a browser;
You should now see "Hello from Terence" - since the ip is assigned to this master node.

Now, try stopping haproxy service: service haproxy stop.
Refreshing the page must now render "Hello from Stella" - as the ip moved over to the backup node.

We now created a HA configuration. For fun, start the haproxy service back on Terence and stop the haproxy service on Stella.

Sunday, December 17, 2017

Transform your on-premise Oracle Investments to Cloud - A Perspective !!

This article is an inspiration from some of the questions that I get asked by customers every day.

1) We have made a lot of on-premise Oracle investments - especially database. How can Oracle help us with our cloud transformation initiatives?
2) How does Oracle DB Cloud Service compare to AWS RDS, Oracle software on Azure? Why should we choose Oracle cloud over competition?
3) What is Oracle's strategy and vision for enterprise customers who have made significant investments over the years on-prem?
4) Other than price-point TCO benefits, what other benefits does Oracle Cloud offer?

In my job role as an enterprise cloud architect, I engage with my customers by bringing in a point-of-view that helps nurture long-term strategy discussions, enrich ideas, propose solution & options to further their cloud/digital transformation endeavors.

In this article, we will analyze a typical customer scenario with various cloud options, inherent PaaS advantages, cost comparisons and non-quantifiable benefits.

Before we delve deep into the details and cost comparisons, I want to state a safe harbor disclaimer that all views (including data points, pricing and options) expressed in this article are my own, based on experience and does not necessarily reflect the views of Oracle. As an Oracle enthusiast and evangelist, this article is purely intended to present a point-of-view, analyze options, value and benefits.

Okay.. Let's take a quick peek at the Oracle database cloud offerings. Built on the basic premise of offering "complete choice", customers have the option to subscribe to the smallest standard DB instance on a VM for development, 2-node RAC cluster DB instance on bare metal for high performance production workloads or opt for the subscription based extreme performance Exadata in the cloud.

Unique to Oracle Cloud, for customers with existing on-premise database licenses, it's an understatement to say the BYOL PaaS pricing model is "attractive". Just for quick comparisons, at published Pay-as-yo-go pricing;

License included DBCS Enterprise Edition (1 OCPU / Hour) is $0.8064
BYOL to Oracle DBCS Enterprise Edition (1 OCPU /Hour) is $0.2903

That is 64% savings right off the bat.

1 OCPU is equivalent of one physical core of Intel Xeon processor with hyper threading enabled - equivalent of AWS' 2 vCPUs and 1 Azure Core.

Let's now look at how this compares to Oracle database on AWS, Azure and GCP. This list is not exhaustive but a selection of a few key considerations for enterprise mission-critical workloads.

AWS and Azure are authorized cloud environments. Google Cloud Platform is not an authorized cloud environment for Oracle Database (predominantly because of how GCP virtualizes their servers).

However, should customers choose AWS or Azure cloud to host Oracle Database? - depends on a few factors;

First and foremost consideration when customers move workloads to cloud: IaaS or PaaS? Database on IaaS only offers "IaaS" benefits like saving datacenter costs. PaaS options like Oracle Database Cloud Service offers higher level of service benefits in the cloud including automated provisioning, elastic scaling, patching, rollback etc..

a) High Availability (HA): For customers with HA needs, this could be a deal breaker as neither Azure nor AWS support RAC (Real Application Clusters). At best AWS RDS offers replication and Multi-AZ deployments but not with zero-downtime.

b) PaaS / Fully Managed: If you are looking for a fully managed, elastic, seamlessly scalable, full-stack patching capabilities, AWS/Azure may not be right fit.

c) License Cost: Although AWS and Azure are authorized cloud environments for running Oracle database, when counting Oracle Processor license requirements, the Oracle Processor Core Factor Table is not applicable. This basically makes it 2x more expensive for customers to run Oracle database on AWS/Azure than on-premise.

d) Provisioned IOPS: Costs can quickly add up if customers choose "provisioned IOPS" SSD for storage. By default, for all workloads Oracle Cloud offers high performance NVMe based SSD storage.

e) Data Security & Encryption: TDE (Transparent Data Encryption) is included and enabled by default in the Oracle Cloud for all Oracle editions and options (including database standard edition). For eg., with AWS customer must buy the "Advanced Security" option.

f) Database Options: Oracle cloud bundles database options into 4 broad offerings. Standard, Enterprise, Enterprise High Performance & Enterprise Extreme Performance. For BYOL customers, even the basic Enterprise Edition comes included with database options such as Diagnostics Pack, Tuning Pack, Real Application Testing, Data Masking & Subsetting Pack. This means, customers with Database EE license can leverage these features in the cloud even if they are not currently licensed on-premise - thus presenting a huge advantage.

g) Backup & Restore: Oracle offers in-place restore for your database backups. This means, you can choose from any of the available backups (automated / point-in-time / most recent) and perform a restore on the same database instance. In contrast, AWS allows restore from backups but creates a "new" database instance - potentially impacting application connectivity, VPC, security group re-configuration.

Now, let's take a typical customer scenario as we walk through various options;

Current Install Base (8 Processor Licenses):

  • Oracle Database Enterprise Edition

Licensed Database Options:
  • Partitioning
  • Real Application Clusters (RAC)
  • Active Data Guard
  • Advanced Compression
  • Database Vault
  • Diagnostics Pack
  • Tuning Pack
  • OLAP
  • Advanced Security

Quick note on Oracle on-prem license metrics - 1 Processor license typically has a 0.5 core factor multiplier unless customers have deployed on high horsepower systems such as Intel Itaniums or IBM Ps.

In this scenario, this means customer can deploy Oracle software on 16 cores - which typically is equivalent to 32 vCPUs in a virtualized environment (Assumption: 1 physical core -> 2 threads).

At list price, initial cost of the above configuration would be $1.27 M (including software license acquisition & support). Pragmatically, @ 60% discount, this could be $500 K.


Year 1 Year 2 Year n
DB EE License $1.27 M $0 $0
Support $358 K $358 K $358 K
Total $1.63 M $358 K $358 K
@ 60% Discount $508 K $143 K $143 K

Now, let's pivot this on-premise database to PaaS (Database as a Service)...
Customer has 2 options;

  • Subscribe to "license-included" DBCS (PaaS). This would preserve their on-prem licenses which could be re-purposed for other projects still on-prem
  • BYOL (Bring Your Own License) option - Convert on-premise database investments to cloud with heavily discounted PaaS subscription costs (Credits applied since customer owns on-prem Oracle database licenses)
For the same configuration, closest option for license-included DBCS is DBCS Extreme Performance (support for RAC & Active Data Guard). Customer is also entitled for other database options like In-Memory, Advanced Analytics etc.. as they are bundled under Extreme Performance edition.

However, with BYOL, customers can bring their DB Enterprise Edition license along with the licensed options and run it on Oracle cloud as PaaS. In this case, customer also gains access to features like Real Application Testing, Data Masking & Subsetting Pack,

This is another unique Oracle cloud feature. For eg., AWS does not offer a "license-included" RDS for Oracle Database Enterprise Edition.

Irrespective of options, subscription cost includes underlying infrastructure (compute, storage & networking), infrastructure support, software (database) licence, software support and automations.

Year 1 Year 2 Year n
License Included DBCS Extreme Performance $360 K $360 K $360 K
BYOL DBCS EE $41 K $41 K $41 K

Clearly BYOL option is a winner with ~89% savings over license included PaaS.

That's not all. The above is based on published PAYG pricing. Further discounting available on monthly commits.

Of course, no one size fits all !! Customers have a wide range of options to choose their deployment on VMs, Bare Metal or Exadata. Engage your Oracle team for value add services including portfolio analysis, TCO & tailored roadmap.

Please leave your feedback and thoughts.

Monday, November 20, 2017

The “Enterprise Cloud”: 5 reasons why Oracle’s Next-Gen Cloud Infrastructure is perfect for your Enterprise

Spend, Security & Sustainability are most likely the top 3 concerns of any CIO/CDO in the cloud era. The spike trend in “cloud transformation” initiatives is at its peak. As enterprises look to pivot to the cloud, it’s imperative not to create a “cloud spaghetti” – the same issue that haunts the traditional on-prem systems. It is not about that first one-off experimental project or lift & shift of an application to the cloud Infrastructure that adds value in the longer run – painting the enterprise’s broader vision, ensuring cloud vendor’s compliance to “standards”, seamless integration options (PaaS), roadmap for cloud maturity/evolution (SaaS) for higher level of service efficiencies – all of which should be key concerns of enterprise architects.

Purpose built for diverse enterprise workloads, the next gen Oracle Cloud Infrastructure promises extreme peak consistent performance, standards compliance and choice at simple intuitive pricing.

Here are 5 ways how Oracle Cloud Infrastructure uniquely offers these capabilities;

1)      Modern X7 and GPU Instances

Oracle Cloud Infrastructure offers compute for a variety of workloads - from cloud-native application development to graphic intensive application workloads. Modern X7 skylake processors with up to 52 OCPUs available in standard, High IO, Dense IO shapes with available local high-speed NVMe storage and Tesla P100 GPUs based on NVIDIA Pascal Generation powers Oracle Cloud Infrastructure.

2)      Choice of Compute & Deployment

Oracle is uniquely positioned to offer 3 deployment models – public cloud, private cloud & cloud @ customer to serve customers of all different shapes, sizes, needs and maturity. Customers can provision dedicated bare-metal servers in the cloud where no provider software resides or virtual machine instances based on needs. Also unique to Oracle Cloud Infrastructure is that it is optimized to run Oracle Databases and Oracle Applications helping customers with their transition to cloud.

3)      High Throughput 25Gbps Flat Network Infrastructure

With a flat network design reaching any compute or storage node within the Oracle Cloud Infrastructure is no more than 2 hops – extreme performance. Connections between any two nodes within an Availability Domain is < 100 microseconds and < 1 millisecond between Availability Domains. Unique to Oracle Cloud Infrastructure is the fact that there is no “tax” for HA – customers pay no “data transfer” charges for HA between Availability Domains.

4)      High Performance NVMe local & Flash-based Block Storage

Oracle Cloud offers best-in-class storage using the industry-leading NVMe SSDs. In terms of performance, what this means is that customers can get up to 25,000 IOPS per service volume. Unique to Oracle Cloud Infrastructure is the model where customers don’t get charged for provisioned IOPS which makes a lot of IOPS intensive usecases much cheaper to run. With out-of-the-box data @ rest encryption, integrated backups and redundancy, customers pay little over 4 cents per GB per month – that’s ~$500 per TB for a year!

5)      Network Isolation

With security at the core of the design, Oracle Cloud Infrastructure virtualizes at the network layer – where it truly belongs. This helps fully encapsulate every customer’s traffic in a completely private SDN. With highly customizable VCNs (Virtual Cloud Networks), fully configurable IP addresses, subnets, routing, firewall and connectivity services, organizations can seamlessly extend their IT infrastructure by mirroring their internal networks or build new network topologies with fine-grained control.