Wednesday, April 6, 2022

Protect Sensitive Data based on Role Based Access Control in BigQuery


Leverage policy tags and row/column level security in BigQuery datasets to secure sensitive data. With role based access controls, BigQuery users will only be able to see and access data (specific rows / columns) that they are privileged with. This works seamlessly with Google Cloud Data Catalog service.

To know more visit my blog here:




Saturday, February 19, 2022

Developing a ML model with Google BigQuery to predict solar coverage

 Google BigQuery lets users develop and operationalize advanced ML models with just SQL skills. You don't have to be a Tensorflow or Keras expert or a data scientist to train ML models. Follow along with this article if you are interested to see how to develop a linear regression ML model in BigQuery to predict the solar coverage percentage for US states.


https://medium.com/@agsathya/develop-and-operationalize-ml-models-using-plain-sql-on-google-bigquery-462bc2bbe441


Thursday, October 21, 2021

Achieve higher levels of cloud maturity with socially responsible IT


Photo by Joey Kyber from Pexels

What is the highest level of cloud maturity in an organization? In my opinion, it is delivering capabilities that are environmentally & socially responsible.

As a visionary, we at Google can help you develop applications and services that are not just cool but also economically and socially responsible. Google has been carbon neutral since 2007 and has committed to operating on 100% carbon-free energy (CFE) 24x7 by 2030. Google data centers are architected to use much less energy than a typical data center - read more about it here.

If you missed the Next ‘21 session on ways to reduce your carbon footprint, you can still catch up on it here.

Just by running your workloads on Google Cloud, you can have a positive impact on the planet. However, it doesn’t have to stop there. Google Cloud offers several metrics, tools and dashboards to help you understand and reduce your carbon footprint including the ability to adhere to GHG Scope 3 standard reporting requirements.

Here are 3 strategies that you can employ to reduce your carbon footprint;

1. Choose the most eco-friendly cloud region

Google publishes carbon data for all its cloud regions and you can see the CFE% and the local electricity grid’s carbon intensity. This will help you pick cleaner cloud regions (if you can) to run your workloads.

  • CFE % - This metric is updated hourly and it represents the average percentage of carbon-free energy consumed in that particular region. In simple terms, you can think of this as the percentage of time your application would run on carbon-free energy. Higher the % the better it is.

  • Grid Carbon Intensity - This metric indicates the gross carbon emissions from the grid per unit of energy. In layman terms, the lower this value, the better.

On the Google Cloud console, some cloud regions may have the Low CO2 marker against them. This means that the region has a Google CFE% of at least 75% and/or a grid carbon intensity of 200 gCO2eq/kWh or less.

Using a combination of the CFE% and the grid carbon intensity indicators, you now have the choice to run your workloads that are eco-friendly.

2. Leverage fully managed services where possible

Fully managed offerings inherently are typically more efficient than manually operated ones. For instance, consider the following scenarios;

  • Running an application on Google Compute Engine (GCE) with auto scaling capabilities is more efficient than running it on a static size server farm on-premise

  • An application that can be containerized (GKE) can offer higher server density resulting in reduced resource usage

  • On the far end of the spectrum, if your workloads can leverage fully managed serverless offerings such as Cloud Run (for containerized workloads) / Cloud Functions (based on FaaS) / BigQuery (serverless data warehouse), these can provide greater efficiencies not just with economies of scale but also can reduce idle resources through smart resource allocation and scaling based on demand.

This may tie directly to your organization’s cloud maturity level and adoption. In a hybrid environment you may still have some monoliths that warrant the use of larger machines/VMs but still having this understanding will help you with your longer term “socially responsible” goals.

3. Optimize resource needs and utilization

This may sound like a cliche. 

But, just having the ability to track, monitor and report on CFE metrics is a great starting point. Solutions like Active Assist can offer insights and recommendations to optimize your cloud usage. Here are some things to ponder upon;

  • Carrying on-premise mindset to cloud - Over-provisioning of VMs / server resources result not just in increased costs, but worse, negative CFE scores

  • Forgot to close the faucet - Unused, running VMs are just bad practice. Having a governance model in place would go a long way in reducing emissions. For VMs that are only needed at certain times try leveraging Scheduled VMs option. The Idle VM recommendation option can be handy to identify and turn down idle VM resources.

  • Try refactoring your monolith applications into microservices. Microservices are nimble, scale independently and can offer higher efficiencies and lower TCO - in stark contrast to monoliths with a significantly larger carbon footprint

  • Lifting & Shifting workloads? This is not a bad option especially if you are just embarking on your cloud journey. Oftentimes, this may be the only option. However, think of creative ways to optimize it for the cloud. Can you separate the analytics capabilities? Containerize peripheral applications / services?

  • Consider running your standalone batch workloads in a region with higher CFE%

  • Preemptible VMs can be a great way to save on operational costs plus reduce your carbon footprint - if you have long running yet stateless workloads that can take advantage of.

Ready to embark on your socially responsible app development?

What is your level of cloud maturity?


Friday, December 25, 2020

Build your own COVID-19 Data Analytics Leveraging Open Source - Serverless Fn & Autonomous Database

Firstly, a huge thanks to all the frontline workers, healthcare professionals, technology contributions, non-profit organizations and everyone else who are fighting the pandemic everyday amidst the risks involved and more importantly the sacrifices - including the most important thing that can never be retrieved - TIME.

As Oracle Cloud continues to aid COVID-19 vaccine research & trials and also help manage the COVID-19 vaccine distribution program, I came up with a simple (humble) contribution that helps analyze and report COVID-19 data.

Although there are several dashboards and reports readily available on the web, they cater to a specific region, locale & metric. For example, what if? one would like to understand the total COVID-19 cases reported in Belarus on/up-to a specific date sliced by total tests conducted, new cases that were tested positive, hospitalization ratio per million against a median age group?

Data is publicly available but making sense of it is the key. In this article, we will see how we can build our own COVID-19 analytics - leveraging simple & popular open source tools and technology with the help of public cloud (Eg. Autonomous Database in Oracle Cloud - that is auto-managed, auto-tuned & auto-patched) [PS: You can also replace Autonomous DB with MySQL]. What's more? - we can potentially achieve this for free with the Always Free cloud instance.

Let's quickly take a look at the high-level solution architecture;


Each component of this architecture is designed to be loosely coupled and can be replaced or further enhanced for functionality/convenience. For instance, I have leveraged Apache open-source serverless Fn project that natively runs on Oracle Cloud - this can be replaced with a Node.js or java code running on K8s-Docker container. Similarly, the Autonomous Database can be replaced with a MySQL DB 

Let's take a look at the key components of this solution;

1) Source dataset [Courtesy: Our World in Data]

"Our World in Data" offers COVID-19 data in a variety of data formats and most importantly offers daily updates - so we can keep our analytics up-to-date.

In our case, we get the dataset in csv format here

2) Python script deployed on Oracle Fn

I have leveraged the Oracle Cloud Functions (based on Apache Fn) serverless platform to deploy the simple python script to download the COVID-19 dataset into an object storage bucket.

The choice of Oracle Cloud Functions is particularly helpful in this case because I don't have to manage any infrastructure or deal with packaging the docker container and version control them. It lets me focus only on the business logic. Also, it supports a variety of programming languages natively including Python, Go & Java. Most importantly, it has built-in security and offers out of the box support for event notifications & triggers and ability to expose functions as APIs.

Pre-Req: 

Create a dynamic group for Oracle Functions and ensure you have a policy defined in your compartment/tenancy for Oracle Functions the ability to access / read csv in the object storage bucket

Instances that meet the criteria defined by any of these rules will be included in the dynamic group.

ALL {resource.type = 'fnfunc', resource.compartment.id = 'ocid1.compartment.oc1..abcdefgxyz'}

Allow dynamic-group FnDynamicGroup to manage objects in compartment sathya.ag

Let's create an Oracle Functions application;

Oracle Cloud -> Developer Services -> Functions -> Create Application & give it a name

Applications lets us group several Oracle Functions. To create a serverless function;

For quick setup, you can leverage the Cloud Shell under the Getting Started instructions to deploy the following python code. Oracle functions platform packages the code as a docker container, uploads the docker image to the default Oracle Cloud docker registry and automatically deploys it as a serverless function with an invoke endpoint.

import io
import json
import oci
import csv
import requests
import logging
import os
import urllib.request

from fdk import response

def progress_callback(bytes_uploaded):
    print("{} additional bytes uploaded".format(bytes_uploaded))

def handler(ctx, data: io.BytesIO=None):
    logging.getLogger().info("Got incoming request")
    signer = oci.auth.signers.get_resource_principals_signer()
    object_name = bucket_name = namespace = ""
    try:
        cfg = ctx.Config()
        input_bucket = cfg["input-bucket"]
        processed_bucket = cfg["processed-bucket"]
        input_csv = cfg["input-csv"]
        object_name = cfg["object-name"]
    except Exception as e:
        print('Missing function parameters: bucket_name', flush=True)
        raise
    logging.getLogger().info("before calling load data {0} {1} {2}".format(input_bucket,
 input_csv, object_name))
    
    
    logging.getLogger().info("download start!")
    filename, headers = urllib.request.urlretrieve(input_csv, filename="/tmp/covid.csv")
    logging.getLogger().info("download complete!")
    
    load_data(signer, namespace, input_bucket, filename, object_name)
    #move_object(signer, namespace, input_bucket, processed_bucket, object_name)

    return response.Response(
        ctx, 
        response_data=json.dumps({"status": "Success"}),
        headers={"Content-Type": "application/json"}
    )
    
def load_data(signer, namespace, bucket_name, input_csv, object_name):
    logging.getLogger().info("inside load data function {0} {1} {2}".format(signer, 
namespace, bucket_name))
    client = oci.object_storage.ObjectStorageClient(config={}, signer=signer)
    try:
        print("INFO - About to read object {0} from local folder and upload to bucket {1}
...".format(object_name, bucket_name), flush=True)
        namespace = client.get_namespace().data
        
        # Use UploadManager to do multi-part upload of file, with 3 Parallel uploads
        logging.getLogger().info("before calling uploadmanager")
        upload_manager = oci.object_storage.UploadManager(client, 
allow_parallel_uploads=True, parallel_process_count=3)
        response = upload_manager.upload_file(namespace, bucket_name, object_name, 
input_csv, progress_callback=progress_callback)
        logging.getLogger().info("response status {0}".format(response.status))
        if (response.status == 200):
            
            message = "Successfully  uploaded %s in bucket %s." % (object_name, 
bucket_name)
            return True

    except Exception as e:
        logging.getLogger().info("exception message {0}".format(e))
        message = " Image upload Failed  in bucket %s. " % bucket_name
        if "oci" in e.__class__.__module__:
            if hasattr(e, 'message'):
                message = message + e.message
            else:
                message = message + repr(e)
                print(message)
        return False

In the configuration section, provide the key value pairs that can be dynamically processed by Oracle Functions at invoke.

3) COVID Dataset in Object Storage

Let's verify if the COVID-19 dataset is successfully downloaded into our object storage bucket.

4) Loading COVID-19 dataset into Autonomous Database

Since we are leveraging Oracle Cloud Autonomous DB, we can leverage the OOTB cloud SQL package to load data from an external object storage bucket. Another variant of this approach could be to leverage the csv data in object storage as an external table and ADB lets you query on the csv data directly.

Again, the choice for Autonomous Database helps us focus only on loading and querying the dataset and not have to worry about the underlying infrastructure, patching & maintenance. I like to think of Autonomous DB as a "serverless" database.

Execute the DBMS_CLOUD.COPY_DATA procedure to load data from object storage into ADB. [Ensure you have the base table created to hold the COVID-19 dataset]

BEGIN
 DBMS_CLOUD.COPY_DATA(
    table_name =>'COVID',
    credential_name =>'JSON_CRED_AG',
    file_uri_list =>'YOUR OBJECT STORAGE BUCKET URI',
    format => json_object('type' value 'CSV', 'ignoremissingcolumns' value 'true', 
'delimiter' value ',', 'blankasnull' value 'true', 'skipheaders' value '1', 'dateformat' value 'YYYY-MM-DD')
 );
END;
/

5) Analytics Dashboard

Now that we have our data in a database, we can leverage any analytics / reporting engine to make sense of the data and generate dynamic reports. In our case, we leverage Oracle Analytics and/or Oracle Data Visualization.



Tuesday, September 29, 2020

Mitigate Ransomware Attacks & Protect your data with Oracle Cloud

Recently, I was working with a Fortune100 retailer. During a cadence with their Chief Technology Officer & Security Advisor, an interesting topic came up for discussion. With ever growing malware attacks - especially Ransomware, the board mandated IT to prioritize strategy to mitigate, prevent & protect their crown jewel (data) against potential Ransomware attacks.

Board concerns included;

  • Protecting Brand Reputation
  • Immediate need for a cost-effective business continuity plan (BCP)
  • Security Compliance

Enterprises across the world - both large & small - have been impacted by Ransomware and incurred several billion dollars in losses - either through loss of business, time to recover and/or ransom costs.

Per wikipedia...

Security experts have suggested precautionary measures for dealing with ransomware. Using software or other security policies to block known payloads from launching will help to prevent infection, but will not protect against all attacks. As such, having a proper backup solution is a critical component to defending against ransomware. Note that, because many ransomware attackers will not only encrypt the victim's live machine but it will also attempt to delete any hot backups stored locally or on accessible over the network on a NAS, it's also critical to maintain "offline" backups of data stored in locations inaccessible from any potentially infected computer, such as external storage drives or devices that do not have any access to any network (including the Internet), prevents them from being accessed by the ransomware.

As hackers find new & creative ways to disrupt global businesses with malicious intent - Reveton, Fusob, WannaCry, BadRabbit, Petya (Remember NotPetya?), SamSam - all different strains of Ransomware over the years that have caused billions in losses, it might sound impossible to predict but certainly possible to prevent, protect & mitigate the impact & damage; should there ever be one.

In this blog, I would like to share my perspective and solution on how we helped the customer by leveraging Oracle's Gen2 Cloud Infrastructure services.

One of the core tenets of security to prevent against Ransomware like malware attacks is to maintain consistent, redundant, secure "offline" backups of critical data - since Ransomware can traverse network.

Our proposal encompassed 3 primary factors that are key for enterprise workloads to run uninterrupted;

1. Enterprise Grade Secure Backups & Cloud Storage

Oracle's Gen2 Cloud offers secure, redundant & enterprise grade cloud backup & storage solution aimed at not just backing up on-premise data (offline backups) but also services that manage & automate consistent on-premise data backups. Specifically the following built-in features offer an immutable, versioned, consistent, redundant & secure storage for all kinds of enterprise data.

  • Two distinct storage tiers for hot & cold backup storage
  • Secure & Restricted access with fine-grained IAM policies
  • Object versioning to prevent accidental/malicious object overwrites/deletion (CRUD)
  • Default AES-256 bit encryption with ability to auto/self managed keys
  • Rich lifecycle automation policies
  • Retention rules to comply with regulatory compliance and ensure data immutability
  • Configurable Replication policies for data redundancy cross-region
  • Self-healing to ensure data integrity

In additions, 

Oracle Storage Gateway offers the ability to deploy the solution with zero disruption as it exposes cloud storage as an NFS locally &

Oracle database backup service automates the management of Oracle database backups from on-premise to cloud

2. Ensure Business Continuity - Not just offline backups for fallback

Oracle cloud Gen2 prides itself on the fact that it is purpose built for the enterprise. With fundamental building blocks at its core such as "off-box virtualization", non-oversubscribed everything (network, BW, compute & storage), defense-in-depth layered security-first cloud architecture & unique offerings such as modern AMD, Intel, Nvidia GPUs, HPC, RDMA clustered networking, NVMe & Exadata, customers can rely on Oracle Cloud and treat it just as an extension of their on-premise IT.

This provides the ability to spin up VMs, Bare Metal servers, VMWare workloads, Databases (Oracle DB VMs, Physical DBs, MySQL, Exadata, Autonomous, SQL Server) - everything potentially needed to ensure business continuity.

3. Security-First Cloud Architecture & Compliance

At its core, Oracle Cloud offers built-in;

  • Edge-Security through Global PoPs, DDoS protection, DNS security & WAF
  • Monitoring with 3rd party security (FW, NGFW, IPS), configuration monitoring, logging & compliance
  • Virtual Network interface segmentation, Security Lists, IPSec VPN, FastConnect & Private Network
  • Tenant isolation, Hardened Images, HW Entropy, Root-of-Trust Card, HSM & signed firmware
  • Data (At-Rest, In-Transit & Key Vault Management)
  • Identity federation, role-based policies, compartments, tagging and instance principals

In additions, 

Fine-grained IAM security policies to secure & restrict resource access at the finest level,

Multi-Factor Authentication (MFA) for additional layer of user security

CASB for OCI offers visibility, threat protection, data security and compliance for OCI deployments.

Below is the reference architecture that addresses Ransomware prevention & mitigation strategy for deployments & data in the Oracle Cloud.

Feel free to reach out if you have a criticism, feedback or queries.

Monday, September 23, 2019

Automate Oracle APEX Deployment for OCI Database Using Terraform

Recently, I was assisting a customer design & configure APEX (Oracle Application Express) for databases on OCI. The purpose of this blog is to augment some finer details, tips & tricks that may help with successful installation (already documented in some detail in this whitepaper.)

Let's quickly look at the architecture / deployment topology. In this example, we will have one centralized APEX instance mapped to multiple database instances. The APEX instance will be provisioned in a public subnet so it can be accessed from the internet. However, the OCI databases will be secured within one or more private subnets.

Step 1: Download Terraform release 0.11.15 (oci) here



Step 2: Depending on your OS, install/configure Terraform on your laptop/PC. For example, if you are running on a mac, download the darwin_amd64 and unzip this in a folder.

Step 3: Copy the "terraform" to your /usr/local/bin directory to install/configure Terraform on your machine

$ unzip terraform_0.11.15-oci_darwin_amd64.zip
$ cp terraform /usr/local/bin

Step 4: Download the APEX terraform template & scripts here. This contains the terraform templates & scripts to install and configure ORDS & APEX on a OCI Compute VM within a public subnet.

Step 5: Create a bucket within your OCI object storage. Download the following and place them within this bucket. The Terraform script will use these to install the appropriate versions of APEX, ORDS & web server.

Hint: You can make this bucket public briefly if you don't want to bother with pre-auth requests etc.. since this will only hold the binaries. Once done, we can change the visibility back to private or blow this bucket up.

a) Download the latest APEX binary here
b) Download the latest ORDS binary here
c) Optionally (if you prefer running APEX on Tomcat) download the latest tomcat zip

Note: Make sure the Apex and ORDS versions are compatible with the version of database you provisioned on OCI.

Step 6: On your OCI tenancy, make sure you have a public subnet within your VCN, attached a internet gateway and a security list. The security list must have the following ingress rules;

0.0.0.0/0           TCP          TF_VAR_COM_PORT (Fetch this based on the port on which you would expose APEX over.)

Step 7: Open the ingress rule on the private subnet (where your database is running) to allow the port (eg., 1521) from the public subnet (where APEX will be deployed)

Step 8: For simplicity, we will be exposing APEX over ip-based access. If you prefer to frontend your apex installation with a DNS, follow the whitepaper reference above. Does a great job offering DNS options.

Step 9: Unzip the ORDS zip file downloaded in Step 4 above.

Step 10: Run terraform --version and make sure you are running on 0.11.15 version. This is important as the terraform template & scripts are written based on this version. You may encounter errors trying to run this AS-IS with later versions of terraform.

Step 11: Gather the following info before proceeding. You will need access to your OCI tenancy to gather lot of details.

a) Generate a pair of SSH public & private keys. These will be used while provisioning the new compute VM that will host APEX & ORDS. Save these for future ssh access to your apex/ords compute VM. Gather the absolute paths for both private & public keys.
b) Generate a API key and fingerprint for your user id (Remember, this is the user that will be used by terraform OCI provider to make API calls into OCI). If you don't know how to generate a API signing key and fingerprint refer to OCI documentation here.
c) OCI Tenancy OCID
d) OCI User ID OCID
e) OCI User Fingerprint
f) OCI Compartment OCID
g) Target Database private IP address
h) Target Database Service Name (If you are running on a multitenant database, make sure you provide the PDB service name and not the root CDB service name).

Hint: Click on DB Connection button on the OCI DB console. It shows the CDB root connection info along with the service name info. Simply replace the CDB domain name with the PDB name. This should look like <<pdb>>.<<subnet>>.<<vcn>>.oraclevcn.com





i) Region: This is the region identifier where you have the database running & eventually APEX. Get your region identifier here.
j) AD: Availability Domain where you would like to install APEX/ORDS. eg., 1, 2 or 3
k) OEL Version: APEX Compute VM will be provisioned on OEL OS. Indicate which version of OEL you would like installed. eg., 7.6
l) Compute Instance Name & Display Name (Choose an appropriate name)
m) Instance Shape: eg., VM.Standard2.2 for a 2 OCPU VM
n) Object Storage URLs for APEX zip, ORDS war file & optionally tomcat zip files
o) Apex/ORDS webserver port number eg., 8080, 8888 etc..
Hint: Remember to open this port on the public subnet security list

Step 12: cd into the ORDS-APEX_Comp directory and run setup.sh. This script will prompt for all the values described above. Once this script is executed, this creates a env-vars file that contains these values for Terraform to use. Alternatively, you can directly edit this file and provide values.

Step 13: Once you make sure all variables are set properly, execute the following;

Hint: You may be prompted for the DB admin password.

terraform init

terraform plan (This should indicate 5 actions that will be performed on your OCI tenancy). This would change depending on whether you choose Tomcat or Jetty as your web server.

+ null_resource.remote-exec_init
      id:                                       <computed>

  + null_resource.remote-exec_tomcat-1
      id:                                       <computed>

  + null_resource.remote-exec_tomcat-2
      id:                                       <computed>

  + null_resource.remote-exec_tomcat-apex
      id:                                       <computed>

  + oci_core_instance.ORDS-Comp-Instance
      id:                                       <computed>

terraform apply

This should run for approximately 15-20 mins. Make sure there are no errors while you run this script. If there are errors, execute "terraform destroy" to rollback all changes. Also if APEX/ORDS is partially deployed on the database (by terraform), clean this up manually. Follow the documentation here.

Some common causes of errors;
1) You may have installed APEX manually on this database before. This will conflict with the terraform APEX install attempt.
2) Terraform script partially executed and failed during APEX install. (Hint: One way to find this out is by observing a bunch of PL/SQL ORA-* errors during terraform execution)

Solution is to clean up APEX manually and ensuring terraform is rolled back using the "destroy" command.

To add / map multiple databases to this APEX/ORDS instance, simply execute the apex_add_db.sh script bundled with the ORDS Terraform scripts.

$ ./apex_add_db.sh -p <database_admin_password> -i <IP_address> -s <database_service_name>

Hope this helps quickly spin up APEX for multiple DB instances.