Hadoop
Distributed File System |
Google
File System |
HDFS |
GFS |
|
|
Cross
Platform |
Linux |
|
|
Developed
in Java environment |
Developed
in c,c++ environment |
|
|
At first
its developed by Yahoo and |
Its
developed by Google |
now its an open source
Framework |
|
|
|
It has Name
node and Data Node |
It has
Master-node and Chunk server |
|
|
128 MB will
be the default block size |
64 MB will
be the default block size |
|
|
Name node
receive heartbeat from |
Master node
receive heartbeat from Chunk |
Data node |
server |
|
|
Commodities
hardware were used |
Commodities
hardware werused |
|
|
WORM –
Write Once and Read Many |
Multiple
writer , multiple reader model |
times |
|
|
|
Deleted files
are renamed into |
Deleted
files are not reclaimed immediately |
particular folder and then
it will |
and are renamed in hidden
name space and |
removed via garbage |
it will deleted after
three days if it’s not in |
|
use |
|
|
No Network
stack issue |
Network
stack Issue |
|
|
Journal
,editlog |
Oprational
log |
|
|
only append
is possible |
random file
write possible |
|
|
Unit 5 Virtualization and Cloud Computing | Cloud security | Cloud Database
May 14, 2023
0
Q.1. List different types of Cloud risks.
Ans:
1. Privacy and Organizational Risks
2. Technical Risks
3. Legal risks
4. Other risks
Q.2. Explain Privacy and Organizational risks.
ABC Corp, a mid-sized company in the healthcare sector, is considering moving its IT infrastructure to the cloud. The company is concerned about maintaining the privacy and security of its sensitive patient data, as well as the potential risks associated with cloud computing.
What are the privacy and organizational risks associated with cloud computing for ABC Corp?
Ans:
1. Lock-in: Cloud lock-in(also known as vendor lock-in or data lock-in) occurs when transitioning data, products or services to another vendor’s platform is difficult and costly, making customers more dependent (locked-in) on a single cloud storage solution
2. Loss of governance: The loss of governance in cloud computing occurs when businesses migrate workloads from an exclusively on-premises IT infrastructure to the cloud without a suitable governance policy in place.
3. Compliance challenges: Cloud compliance is the art and science of complying with regulatory standards of cloud usage in accordance with industry guidelines and local, national, and international laws.
4. Cloud service termination or failure: There must be 24x7 support and high availability of all services, but in the competitive world of IT, an adequate business strategy, lack of financial support and other factors could lead some providers to go out of business or shut down their service portfolio offering. And it is possible that for a short or medium period of time some cloud computing services could be terminated.
5. Supply Chain Failure: Supply chain failure as a breakdown caused by either the internal operations or external suppliers that cause significant quality, delivery or cost impact to your company and/or customers. Many times, blame is place on outside suppliers or contract manufacturers without proper root cause analysis.
Q.3. Explain Technical risks in cloud computing?
XYZ Corp is a large multinational company that has recently decided to move its entire IT infrastructure to the cloud. The company has concerns about the technical risks associated with cloud computing.What are some of the technical risks associated with cloud computing for XYZ Corp?
Ans:
1. Isolation failure: multi-tenancy and shared resources are defining characteristics of cloud computing. This risk category covers the failure of mechanisms separating storage, memory, routing and even reputation between different tenants.
2. Resource exhaustions : Resource exhaustion attacks generally exploit a software bug or design deficiency. In software with manual memory management (most commonly written in C or C++), memory leaks are a very common bug exploited for resource exhaustion.
3. Cloud provider malicious insider: malicious insider is an insider who intends to cause damage to the organization for personal gain. Because of their access and knowledge of the organization's most valuable assets, attacks involving malicious insiders are harder to identify and remediate than those that originate from outside the organization.
4. Intercepting data in transit: The data is vulnerable while it is being transmitted. Data can be intercepted and compromised as it travels across the network where it is out of a user's direct control. For this reason, data should be encrypted when in transit. Encryption makes the data unreadable if it falls into the hands of unauthorized users.
5. Insecure or ineffective deletion of data: When it comes to deleting or completely destroying old data from your computer, laptop, hard drive or other media devices, it is vital to keep safety and security the main priorities. Many people and even companies often use unsafe methods to destroy or erase confidential data. Simply deleting or reformatting your computer may not be secure or safe enough. Continuing to practice poor data destruction methods will inevitably lead to identity theft and data breaches.
6. Loss of encryption keys: This includes disclosure of secret keys( e.g file encryption, Customer private keys) or passwords to malicious parties, the loss or corruption of those keys.
7. Malicious probes or scams: Malicious probes or scams are indirect threats to the assets being considered. They can be used to collect information in the context of a hacking effort. A probable impact could be a loss of confidentiality, integrity and availability of service and data.
8. Compromise service engine: cloud provider rely on specific service engine that is placed on top of physical hardware. For IaaS, this can be hypervisor. For PaaS, it can be hosted application. Hacking the service engine may be useful to escape the isolation.
Q.4. Explain Legal Risks.
LMN Corp is a mid-sized company that operates in the financial sector. The company is considering moving its IT infrastructure to the cloud, but is concerned about the legal risks associated with cloud computing.What are the legal risks associated with cloud computing for LMN Corp?
Ans:
1. Risk from changes of jurisdiction:Customer data may be kept in several jurisdictions, some of which may be high risk. If data centres are located in high-risks countries, sites could be attacked by local authorities and data or systems subject to enforced disclosure or seizure.
2. Licensing risks:Licensing conditions, such as per-seat agreements and online licensing checks unstable in a cloud environment.
3. Data protection risks: It can be tough for the cloud customer to efficiently check the data processing that the cloud provider brings out and hence be sure that data is handled in a lawful way.
Q.5. Explain about Other risks involved in Cloud Computing.
Ans:
1. Backup lost or stolen: This risk is possible due to inadequate physical security procedures, AAA vulnerabilities, user provisioning vulnerabilities and user de-provisioning vulnerabilities.
2. Unauthorized access to premises: Because of inadequate physical security procedures, unauthorized access in datacentres is possible is possible. Generally, cloud providers have large datacentres; therefore, physical control of a datacentre must be stronger because the impact of a breach of this issue could be higher.
3. Theft of computer equipment: This risk is possible because of inadequate physical security procedures. This risk is mainly related to the datacentres, and dual authentication mechanism should be followed to accesses those machines.
4. Natural disasters: Natural disasters are possible any time so there must be a perfect disaster recovery plan. Although, the risk from natural disasters is quite less compared to traditional infrastructures because cloud providers offer redundancy and fault tolerance by default; for examples, AWS has various physical regions and multiple availability zone option within a region also.
Q.6. Explain Security Architecture and its layers.
Ans:
1. Data Centre Layer: This layer is related to traditional infrastructure security concerns. It consists of physical hardware security, theft protection, network security and all physical assets security.
2. VM Layer: This layer involves VM level security issues, VM monitoring, hypervisor-related security issues and VM isolation management issues.
3. Service provider layer: This layer is responsible for identity and access management, service level agreement (SLA), metering, compliance and audit- related issues.
4. User layer: This is the first layer of user interaction. It is responsible for user authentication and authorization and all browser- related security issues.
User Layer
[User Authentication,Browser Security]
Service Provider Layer
[Identity and access management,SLA,Audit,Compliance,Metering]
VM Layer
[VM level Security,Hypervisor Security,Isolation Management]
Data Center
[Physical Hardware Security,Network Security]
Q.7. Explain VM Security Challenges.
PQR Corp is a large enterprise that uses virtual machines (VMs) to run its applications and services. The company is concerned about the security challenges associated with VMs.What are the security challenges associated with VMs for PQR Corp?
Ans:
1. Virtual machine escape:- It is an exploit in which the attacker runs code on a VM that allows an operating system running within it to break out and interact directly with the hypervisor. Such an exploit could give the attacker access to the host operating system.
2. VM Hoping: VM hopping is a common attack mode in virtualization security attacks. It means that an attacker attempts to gain access to other virtual devices on the same Hypervisor based on one virtual machine, and then attacks it.
3. Virtualization sprawl: It is a phenomenon that occurs when the number of virtual machines (VMs) on a network reaches a point where administrators can no longer manage them effectively. Virtualization sprawl is also referred to as virtual machine sprawl, VM sprawl or virtual server sprawl.
4. Insecure VM migration: A workload cannot migrate to a destination server if it does not have the computing resources required to support it. Migration problems can occur when the destination server lacks adequate processor cores, memory space or NIC ports or has a storage shortage, and cannot reserve resources for the new workload.
5. Sniffing and Spoofing: Spoofing is when an attacker creates TCP/IP using another person's IP address. A sniffer software is placed between two interactive endpoints in packet Sniffing, where the attacker pretends to be one end of the connection to the target and snoops on data sent between the two points.
Q.8. Explain Cloud Database and operational model for cloud database.
Ans:
Cloud database is a database that runs on a cloud computing platform like Amazon EC2, Rackspace and GoGrid. There are two ways to deploy a database- users can either run the database inside a secured virtual machine (VM) or subscribe for particular database services managed by a cloud service provider. Currently, there are some SQL-based and some NoSQL-based database offerings.
1. VM Image:- Cloud platforms allow users to purchase VM instances for a limited time. A cloud provider facilitates more security for running databases inside a VM. If users have their own VM image, then they upload it and run the database inside that or do so through preinstalled databases. Oracle, for example, provides preinstalled image with the Oracle database 11g for Amazon EC2 instances.
2. Database as a service:- Some cloud platform and infrastructure service providers offer database services just as other services offerings, in which case we do not need to launch any instance or individual VM for database installation. All database licensing, updating and configuration are managed by the cloud provider. Application owners have to, each month, pay-per-use of database volume. AWS provides many data-base services offering to their customers including relational database services(RDS) and NoSQL services such as Amazon RDS, Amazon DynamoDB, Amazon SimpleDB and Amazon Redshift.
The traditional application owner prefers RDS and in RDS, users have many choices such as MySQL, Oracle, SQl Server or PostgreSQL database engines. All enterprise licensing issue and updates are taken care of by the provider.
Q.9. Explain common characteristics and architectural characteristics of Cloud Database.
Ans:
1. Fast Deployment: Cloud databases are the perfect choice when you urgently need a database, as they can be up and running in minutes. Cloud databases eliminate the need to purchase and install hardware and set up a network.
2. Accessibility: Users have quick access to cloud databases remotely through the web interface.
3. Scalability: You can expand cloud database storage capacity without disruptions and meet the requirements. Cloud database scalability is seamless due to DBaaS implementation, which is a major benefit for growing businesses with limited resources.
4. Disaster Recovery: Data backups are regularly performed on cloud databases and kept on remote servers. These backups enable a business to stay online in cases of natural disasters, equipment failure, etc.
5. Lower Hardware Costs: Cloud database service providers supply the infrastructure and perform database maintenance. Hence, companies invest less in hardware and have fewer IT engineers for database maintenance.
6. Value for Money: Many DBaaS solutions are available in multiple configurations, allowing companies only to pay for what they use and turn off services when they don't need them. Cloud databases also save money by not requiring operational costs or expensive upgrades.
7. Latest Tech: Cloud database providers upgrade infrastructure and keep it updated with new tech. This brings significant savings as companies don't have to allocate funds on new tech or staff training.
8. Security: Most cloud database providers encrypt data and invest in the best cloud security solutions to keep the databases safe. Although there is no impenetrable security system, it is a safe way to protect data. Since cloud database providers use automation to enforce the best security practices, there is less room for human error compared to using on-premises databases.
Q.10. Explain types of Cloud Databases.
Ans:
Many cloud providers offers RDS nowadays. Some popular and most adopted RDS across the globe are as follows:
Amazon relational database service: Amazon RDS is very popular and widely adopted Web service. It looks like other AWS services and provides easy management consoles for operating RDS on cloud. Amazon RDS is a highly cost-efficient and secured service. Currently it supports Oracle, SQL Server, MySQL and PostgreSQL database. Amazon RDS specifically offers two types of RDS instances.
On-demand instances: An on-demand instance offering is a pay-per-use instance with no long-term commitment.
Reserved DB instances: Reserved DB instances give the flexibility of one-time payment for the DB instance if the database usage is predictable. There also an offer of 30%-50% price cut over the on-demand price.
1. Google cloud SQL: Google cloud SQL is a MySQL database service that is managed by Google, and the entire management, data replication, encryption, security and backups are handled by Google's cloud infrastructure. Google claims maximum availability of its data because its data centers are located across every region of the world.
2. Heroku Postgres: Heroku Postgres is a relational SQL database offered by Heroku. It is accessible through all programming languages supported by Heroku. It basically provisioned as an add-on service. Heroku Postgres offers fully reliability of services, which means around 99.99% uptime and 99.999999999% durability of data. One of the advance features of Heroku Postgres is Dataclips, which enables users to send the results of the SQL query via the URL.
3. HP cloud relational database for MySQL: HP cloud RDS automate application deployment, configuration management and patch-up task database. It currently supports command line interface (CLI). It also provides database snapshot facility in multiple availability zones for providing more reliability. It is also built atop an OpenStack-based MySQL distribution, which provides database interoperability from one cloud provider to other.
4. Microsoft Azure SQL database: Earlier it was known as SQL Azure. It is the most important component of the Microsoft Azure cloud service; however, it can be operated as a standalone cloud database also. The database can be synched easily with other SQL server databases within the cloud infrastructure of the company or organization. With Microsoft Azure SQL database, the performance of database can be predicted irrespective of whether the service chosen is basic, standard or premium.
5. Oracle database cloud service: Oracle database cloud offers two options for users: one is a single schema-based service and another is fully configured Oracle database installed virtual machine. Oracle database can be quickly provisioned, and the user can spin up a database instance with just a few clicks. It also provides flexibility in the management option: self-managed service or fully managed by Oracle.
6. Rackspace cloud databases: Rackspace cloud databases are based on open standards. These Currently support MySQL, Percona and MariaDB databases. Rackspace cloud provides high database performance using container-based virtualization. It provides automated configuration, which reduces operational costs and team effort. Rackspace cloud is built on top of an open-source technology like the OpenStack cloud platform.
Q.11. List down the limitations of Existing databases.
Ans:
Following are some of the key limitations that became the reason behind the birth of NoSQL databases. Traditional databases are unable to:
• Store data in TB/PB; even a good processor cannot process millions of rows.
• Process TB of data on a single machine. Q.12. List down some types of No-SQL Database. Ans:
• Key-value store: Based on table keys and values (e.g. AWS DynamoDB).
• Document-based store: Document-based database stores records that are made of tagged elements (e.g. MongoDB, CouchDB).
• Column-based store: Data divided into multiple columns and every storage block contains data of each column (e.g., Apache HBase, Cassandra).
• Graph-based store: A network graph storage that uses edges and nodes for storing data (e.g. Neo-4)
Q.13. Explain Distributed File System Basics and some popular file systems.
Ans:
Distributed file system (DFS) is basically used for storing huge amount of data and provides accessibility of stored data to all distributed clients across the network. The objective of the DFS is to provide a system for all the geographically distributed users as a common file system for data sharing and storage.
An Internet search engine is the most common example of DFS, which is used for indexing millions of Web pages. There are a number of DFS that solve this problem in different ways. Some popular file systems are:
1. Andrew file system (AFS)
2. Network file system (NFS)
3. Microsoft distributed file system (DFS)
4. Apple filing protocol (AFP)
5. Google file system (GFS)
6. Hadoop distributed file system (HDFS)
Q.14. Explain Google File System Architecture.
Ans:
A cluster of a Google file system contains a single master and multiple chunk servers that are associated with many clients. The master holds the metadata of chunk servers. All the data processing happens through these chunk servers. The client first contacts the master and retrieves the metadata of the chunk server, which is then stored in the chunk servers. So the next time, client directly connects to the chunk servers.
1. Chunk: A chunk is very similar to concept of block in a file system, but chunk size is larger than the traditional file system block. The block of chunk is 64 MB. This is specifically designed for the Google environment.
2. Master: Master is a single process that runs on entirely separate machine for security purposes. It only stores metadata-related information, chunk location, file mapping information and access control information. The client first contacts the master for information about metadata and then connects to that particular chunk server.
3. Metadata: Metadata is stored in the memory of a master, therefore, master operations are much faster. Metadata contains three types of information:
• Namespaces of file and chunk
• Location of each chunk
• Mapping from file to chunk
Q.15. Explain HDFS Architecture.
Ans:
HDFS is a DFS based on GFS that provides high throughput access to application data. It uses the commodity hardware with the expectation that failures will occur and provides portability across heterogeneous hardware and software platforms. HDFS is acquired by Hadoop Apache open source project, which is very popular these days for its ability to handle big data.
The Hadoop core consists of two modules:
1. Hadoop distributed file system: Used for storing huge amount of data.
2. MapReduce programming mode: Used for processing of large set of data.
The working architecture of HDFS is almost similar to GFS.
An HDFS cluster consists of multiple commodity machines that can be classified into the following three types:
1. Name node (runs on master machine)
2. Secondary name node or backup node (runs on separate machine)
3. Data node (runs on slave machine)
The working of an HDFS is the same as master slave architecture. Here the master is the name node that contains the metadata of the cluster, but the processing occurs through data nodes. The client first connects to the metadata and receives information about the data node and the next time directly connects to the data node. GFS works the same way as well.
Q.16. State the comparison of features between HDFS and GFS.
Tags