Cassandra offers robust support for clusters spanning multiple datacenters. Highly scalable and high in performance. Cassandra has peer to peer distributed system across its nodes and data is distributed among all the nodes in a cluster. Each node is independent and interconnected to other nodes. A 100% availability and the lowest total cost of ownership can be found only in one database i.e., Cassandra. It is the right choice when you need scalability and high availability without compromising performance. The distribution of the data across multiple machines in an application- transparent matter is the responsibility of Cassandra. It will automatically repartition as machines are added and removed from the cluster.
Miri Infotech is launching a product which will configure and publish Cassandra, to produce free implementations of distributed or otherwise scalable and high availability which is embedded pre-configured tool with Ubuntu and ready-to-launch AMI on Amazon EC2 that contains Cassandra and Hadoop.
One of the amazing thing about Cassandra which it’s users enjoy is that there is no single point of failure in its database as well as no network bottlenecks.
Apache Cassandra is not only limited to this but is also a free and open-source distributed NoSQL(Not Only Sql) database management system which is designed to handle large amounts of data across many commodity servers. The level of its performance is too high which makes it more powerful and flexible to be easily used by everyone.
It uses a concept of nodes where there is no such thing called master node but instead all nodes play identical role, communicating with each other equally. Thousands of concurrent users or operations per sec are handled here.
Many companies have successfully deployed and benefited from Apache Cassandra such as Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have large, active data sets.
Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes in a cluster. All the nodes in a cluster play the same role. Each node is independent and at the same time interconnected to other nodes. Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster. When a node goes down, read/write requests can be served from other nodes in the network.
Some Features of Cassandra are
FAULT TOLERANT
Replication of data is automatically done to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
PERFORMANT
Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.
DECENTRALIZED
There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.
SCALABILITY
Designed to have read/write throughput both increase linearly as new machines are added with the aim of no interruptions.
DURABLE
Cassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.
TUNABLE CONSISTENCY
Writes and reads offer a tunable level of consistency, all the way from “writes never fail” to “block for all replicas to be readable, with the quorum level in the middle.
MONITORING & ALERTING
Node monitoring and alerting for events of interest including performance and latency, disk capacity and node responsiveness. Customized alerting is also possible through our monitoring architecture.
This is not the end to its features as if we dig much deeper, there would be more and more stuff related to it.
Now, without knowing what all is responsible or what all it takes to make Cassandra such an amazing database, we should give a study on its components.
Components of Cassandra are:
The key components of Cassandra are as follows −
-
Node − It is the place where data is stored.
-
Data center − It is a collection of related nodes.
-
Cluster − A cluster is a component that contains one or more data centers.
-
Commit log − the commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log.
-
Mem-table − A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables.
-
SSTable − It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value.
-
Bloom filter − These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.