Friday, December 25, 2020

Build your own COVID-19 Data Analytics Leveraging Open Source - Serverless Fn & Autonomous Database

Firstly, a huge thanks to all the frontline workers, healthcare professionals, technology contributions, non-profit organizations and everyone else who are fighting the pandemic everyday amidst the risks involved and more importantly the sacrifices - including the most important thing that can never be retrieved - TIME.

As Oracle Cloud continues to aid COVID-19 vaccine research & trials and also help manage the COVID-19 vaccine distribution program, I came up with a simple (humble) contribution that helps analyze and report COVID-19 data.

Although there are several dashboards and reports readily available on the web, they cater to a specific region, locale & metric. For example, what if? one would like to understand the total COVID-19 cases reported in Belarus on/up-to a specific date sliced by total tests conducted, new cases that were tested positive, hospitalization ratio per million against a median age group?

Data is publicly available but making sense of it is the key. In this article, we will see how we can build our own COVID-19 analytics - leveraging simple & popular open source tools and technology with the help of public cloud (Eg. Autonomous Database in Oracle Cloud - that is auto-managed, auto-tuned & auto-patched) [PS: You can also replace Autonomous DB with MySQL]. What's more? - we can potentially achieve this for free with the Always Free cloud instance.

Let's quickly take a look at the high-level solution architecture;


Each component of this architecture is designed to be loosely coupled and can be replaced or further enhanced for functionality/convenience. For instance, I have leveraged Apache open-source serverless Fn project that natively runs on Oracle Cloud - this can be replaced with a Node.js or java code running on K8s-Docker container. Similarly, the Autonomous Database can be replaced with a MySQL DB 

Let's take a look at the key components of this solution;

1) Source dataset [Courtesy: Our World in Data]

"Our World in Data" offers COVID-19 data in a variety of data formats and most importantly offers daily updates - so we can keep our analytics up-to-date.

In our case, we get the dataset in csv format here

2) Python script deployed on Oracle Fn

I have leveraged the Oracle Cloud Functions (based on Apache Fn) serverless platform to deploy the simple python script to download the COVID-19 dataset into an object storage bucket.

The choice of Oracle Cloud Functions is particularly helpful in this case because I don't have to manage any infrastructure or deal with packaging the docker container and version control them. It lets me focus only on the business logic. Also, it supports a variety of programming languages natively including Python, Go & Java. Most importantly, it has built-in security and offers out of the box support for event notifications & triggers and ability to expose functions as APIs.

Pre-Req: 

Create a dynamic group for Oracle Functions and ensure you have a policy defined in your compartment/tenancy for Oracle Functions the ability to access / read csv in the object storage bucket

Instances that meet the criteria defined by any of these rules will be included in the dynamic group.

ALL {resource.type = 'fnfunc', resource.compartment.id = 'ocid1.compartment.oc1..abcdefgxyz'}

Allow dynamic-group FnDynamicGroup to manage objects in compartment sathya.ag

Let's create an Oracle Functions application;

Oracle Cloud -> Developer Services -> Functions -> Create Application & give it a name

Applications lets us group several Oracle Functions. To create a serverless function;

For quick setup, you can leverage the Cloud Shell under the Getting Started instructions to deploy the following python code. Oracle functions platform packages the code as a docker container, uploads the docker image to the default Oracle Cloud docker registry and automatically deploys it as a serverless function with an invoke endpoint.

import io
import json
import oci
import csv
import requests
import logging
import os
import urllib.request

from fdk import response

def progress_callback(bytes_uploaded):
    print("{} additional bytes uploaded".format(bytes_uploaded))

def handler(ctx, data: io.BytesIO=None):
    logging.getLogger().info("Got incoming request")
    signer = oci.auth.signers.get_resource_principals_signer()
    object_name = bucket_name = namespace = ""
    try:
        cfg = ctx.Config()
        input_bucket = cfg["input-bucket"]
        processed_bucket = cfg["processed-bucket"]
        input_csv = cfg["input-csv"]
        object_name = cfg["object-name"]
    except Exception as e:
        print('Missing function parameters: bucket_name', flush=True)
        raise
    logging.getLogger().info("before calling load data {0} {1} {2}".format(input_bucket,
 input_csv, object_name))
    
    
    logging.getLogger().info("download start!")
    filename, headers = urllib.request.urlretrieve(input_csv, filename="/tmp/covid.csv")
    logging.getLogger().info("download complete!")
    
    load_data(signer, namespace, input_bucket, filename, object_name)
    #move_object(signer, namespace, input_bucket, processed_bucket, object_name)

    return response.Response(
        ctx, 
        response_data=json.dumps({"status": "Success"}),
        headers={"Content-Type": "application/json"}
    )
    
def load_data(signer, namespace, bucket_name, input_csv, object_name):
    logging.getLogger().info("inside load data function {0} {1} {2}".format(signer, 
namespace, bucket_name))
    client = oci.object_storage.ObjectStorageClient(config={}, signer=signer)
    try:
        print("INFO - About to read object {0} from local folder and upload to bucket {1}
...".format(object_name, bucket_name), flush=True)
        namespace = client.get_namespace().data
        
        # Use UploadManager to do multi-part upload of file, with 3 Parallel uploads
        logging.getLogger().info("before calling uploadmanager")
        upload_manager = oci.object_storage.UploadManager(client, 
allow_parallel_uploads=True, parallel_process_count=3)
        response = upload_manager.upload_file(namespace, bucket_name, object_name, 
input_csv, progress_callback=progress_callback)
        logging.getLogger().info("response status {0}".format(response.status))
        if (response.status == 200):
            
            message = "Successfully  uploaded %s in bucket %s." % (object_name, 
bucket_name)
            return True

    except Exception as e:
        logging.getLogger().info("exception message {0}".format(e))
        message = " Image upload Failed  in bucket %s. " % bucket_name
        if "oci" in e.__class__.__module__:
            if hasattr(e, 'message'):
                message = message + e.message
            else:
                message = message + repr(e)
                print(message)
        return False

In the configuration section, provide the key value pairs that can be dynamically processed by Oracle Functions at invoke.

3) COVID Dataset in Object Storage

Let's verify if the COVID-19 dataset is successfully downloaded into our object storage bucket.

4) Loading COVID-19 dataset into Autonomous Database

Since we are leveraging Oracle Cloud Autonomous DB, we can leverage the OOTB cloud SQL package to load data from an external object storage bucket. Another variant of this approach could be to leverage the csv data in object storage as an external table and ADB lets you query on the csv data directly.

Again, the choice for Autonomous Database helps us focus only on loading and querying the dataset and not have to worry about the underlying infrastructure, patching & maintenance. I like to think of Autonomous DB as a "serverless" database.

Execute the DBMS_CLOUD.COPY_DATA procedure to load data from object storage into ADB. [Ensure you have the base table created to hold the COVID-19 dataset]

BEGIN
 DBMS_CLOUD.COPY_DATA(
    table_name =>'COVID',
    credential_name =>'JSON_CRED_AG',
    file_uri_list =>'YOUR OBJECT STORAGE BUCKET URI',
    format => json_object('type' value 'CSV', 'ignoremissingcolumns' value 'true', 
'delimiter' value ',', 'blankasnull' value 'true', 'skipheaders' value '1', 'dateformat' value 'YYYY-MM-DD')
 );
END;
/

5) Analytics Dashboard

Now that we have our data in a database, we can leverage any analytics / reporting engine to make sense of the data and generate dynamic reports. In our case, we leverage Oracle Analytics and/or Oracle Data Visualization.



Tuesday, September 29, 2020

Mitigate Ransomware Attacks & Protect your data with Oracle Cloud

Recently, I was working with a Fortune100 retailer. During a cadence with their Chief Technology Officer & Security Advisor, an interesting topic came up for discussion. With ever growing malware attacks - especially Ransomware, the board mandated IT to prioritize strategy to mitigate, prevent & protect their crown jewel (data) against potential Ransomware attacks.

Board concerns included;

  • Protecting Brand Reputation
  • Immediate need for a cost-effective business continuity plan (BCP)
  • Security Compliance

Enterprises across the world - both large & small - have been impacted by Ransomware and incurred several billion dollars in losses - either through loss of business, time to recover and/or ransom costs.

Per wikipedia...

Security experts have suggested precautionary measures for dealing with ransomware. Using software or other security policies to block known payloads from launching will help to prevent infection, but will not protect against all attacks. As such, having a proper backup solution is a critical component to defending against ransomware. Note that, because many ransomware attackers will not only encrypt the victim's live machine but it will also attempt to delete any hot backups stored locally or on accessible over the network on a NAS, it's also critical to maintain "offline" backups of data stored in locations inaccessible from any potentially infected computer, such as external storage drives or devices that do not have any access to any network (including the Internet), prevents them from being accessed by the ransomware.

As hackers find new & creative ways to disrupt global businesses with malicious intent - Reveton, Fusob, WannaCry, BadRabbit, Petya (Remember NotPetya?), SamSam - all different strains of Ransomware over the years that have caused billions in losses, it might sound impossible to predict but certainly possible to prevent, protect & mitigate the impact & damage; should there ever be one.

In this blog, I would like to share my perspective and solution on how we helped the customer by leveraging Oracle's Gen2 Cloud Infrastructure services.

One of the core tenets of security to prevent against Ransomware like malware attacks is to maintain consistent, redundant, secure "offline" backups of critical data - since Ransomware can traverse network.

Our proposal encompassed 3 primary factors that are key for enterprise workloads to run uninterrupted;

1. Enterprise Grade Secure Backups & Cloud Storage

Oracle's Gen2 Cloud offers secure, redundant & enterprise grade cloud backup & storage solution aimed at not just backing up on-premise data (offline backups) but also services that manage & automate consistent on-premise data backups. Specifically the following built-in features offer an immutable, versioned, consistent, redundant & secure storage for all kinds of enterprise data.

  • Two distinct storage tiers for hot & cold backup storage
  • Secure & Restricted access with fine-grained IAM policies
  • Object versioning to prevent accidental/malicious object overwrites/deletion (CRUD)
  • Default AES-256 bit encryption with ability to auto/self managed keys
  • Rich lifecycle automation policies
  • Retention rules to comply with regulatory compliance and ensure data immutability
  • Configurable Replication policies for data redundancy cross-region
  • Self-healing to ensure data integrity

In additions, 

Oracle Storage Gateway offers the ability to deploy the solution with zero disruption as it exposes cloud storage as an NFS locally &

Oracle database backup service automates the management of Oracle database backups from on-premise to cloud

2. Ensure Business Continuity - Not just offline backups for fallback

Oracle cloud Gen2 prides itself on the fact that it is purpose built for the enterprise. With fundamental building blocks at its core such as "off-box virtualization", non-oversubscribed everything (network, BW, compute & storage), defense-in-depth layered security-first cloud architecture & unique offerings such as modern AMD, Intel, Nvidia GPUs, HPC, RDMA clustered networking, NVMe & Exadata, customers can rely on Oracle Cloud and treat it just as an extension of their on-premise IT.

This provides the ability to spin up VMs, Bare Metal servers, VMWare workloads, Databases (Oracle DB VMs, Physical DBs, MySQL, Exadata, Autonomous, SQL Server) - everything potentially needed to ensure business continuity.

3. Security-First Cloud Architecture & Compliance

At its core, Oracle Cloud offers built-in;

  • Edge-Security through Global PoPs, DDoS protection, DNS security & WAF
  • Monitoring with 3rd party security (FW, NGFW, IPS), configuration monitoring, logging & compliance
  • Virtual Network interface segmentation, Security Lists, IPSec VPN, FastConnect & Private Network
  • Tenant isolation, Hardened Images, HW Entropy, Root-of-Trust Card, HSM & signed firmware
  • Data (At-Rest, In-Transit & Key Vault Management)
  • Identity federation, role-based policies, compartments, tagging and instance principals

In additions, 

Fine-grained IAM security policies to secure & restrict resource access at the finest level,

Multi-Factor Authentication (MFA) for additional layer of user security

CASB for OCI offers visibility, threat protection, data security and compliance for OCI deployments.

Below is the reference architecture that addresses Ransomware prevention & mitigation strategy for deployments & data in the Oracle Cloud.

Feel free to reach out if you have a criticism, feedback or queries.