Technology

Infinity is a planet-scale analytical database with blazing fast SQL queries over opaque content. This document describes the architecture of Infinity and the design choices and innovations which enable it to produce highly efficient execution of complex distributed queries across diverse opaque data sets, such as video, audio, images and text.

Abstract

Introduction

The video, audio, VR, image, and text content generated by consumers, cameras, devices, ambient sensors, and autonomous agents is exploding at an unprecedented rate.

There is a growing class of applications that make use of large video, audio and image collections. These range from robotics to retail from drone photography to automated grocery checkout -- all exciting examples of opaque data analytics, because they require processing a huge number of video/audio/images with very expensive deep learning algorithms.

Unlike the content on the public internet that gets indexed by search engines like Google and Bing, the (enterprise) content persisted in the cloud storage systems like S3, Box, Azure Blob Storage, Dropbox, etc., is largely opaque, unindexed, and dark. There is a tremendous amount of queryable information buried inside of the opaque data types like video and audio, images, PDFs and textual content in the cloud storage systems, waiting to be unearthed using the power of deep learning.

As of 2019, more than 500 hours of video are uploaded to YouTube every minute worldwide. That’s 30,000 hours of video uploaded every hour and 720,000 hours of video uploaded every day. About 1 billion hours of YouTube videos is consumed every single day. (A playlist of 1 billion hours of YouTube videos would run for 100,000 years.)
At Facebook, on an average day, videos are viewed more than 8 billion times on Facebook.
57% of all files stored in Microsoft’s OneDrive are photos and videos. OneDrive consumers bring in around 250 million photos a day, coming from Windows and mobile camera backup.
Total continuous visual capture capability of the world from streaming security video cameras is estimated by some researchers is going to be ~= 12B video streams in 2030. That’s 6*1017 pixels ≈ 600 petapixels (600 quadrillion pixels).

Planet-scale analytical database with SQL queries over opaque content and deep learning at its core

Infinity is a planet-scale analytical database with blazing fast SQL queries over opaque content that is elastically scalable and uses deep learning at its core. With Infinity, the previously opaque data types like video, images (including PDFs), audio clips, and text are no longer opaque blobs but are magically transformed into intelligent and queryable data types. Infinity will transform opaque content into intelligent content by extracting, inferring, detecting, and indexing entities and metadata, and making it queryable with ANSI SQL extensions. Infinity is a fully managed, globally distributed database service with industry-leading SLAs - 99.999% availability of high availability, 99.999 durability and single-digit millisecond low-latency at P99. Infinity operates over all three cloud providers. Infinity allows customers to elastically and independently scale storage and throughput to provide the highest query throughput /dollar in the industry.

Data Model

As content gets processed, extracted metadata (structural and semantic) is stored as JSON documents. In traditional relational database systems, you must define a schema, which describes the tables in a database, the type of data that can be stored in each column and the relationships between tables of data. In contrast, documents describe the data in the document, as well as the actual data, allowing flexibility and variation in the structure of each document. Using a document model, Infinity can perform schemaless ingestion of data in JSON, XML, Avro, Parquet and other semi-structured formats.

Infinity differs from typical document-oriented databases in that it indexes and stores data in a way that can support SQL queries on unstructured, semi-structured and structured data. Our architecture is designed from the ground up so that schematization is not necessary at all, and the storage format is efficient even if different documents have data of different types in the same field. A schema is not inferred by sampling a subset of the data - rather, the entire data set is indexed so when new documents with new fields are added, they are immediately exposed in the form of a smart schema to users and made queryable using SQL in Infinity. This means new documents and new metadata extracted from opaque data are never rejected if they unexpectedly show up with new fields or data formats.

Metadata extracted (represented as JSON documents) can be complex, with nested data to provide additional sub-categories of information about the video/audio/image/text objects. Infinity supports arrays and objects, which can contain any value and hence be nested recursively. Infinity is designed to allow efficient ingestion, indexing and querying of deeply nested data.

In the case of nested JSON, the documents can be “flattened” in effect, so that nested arrays can be queried elegantly. Since traditional SQL specifications do not address nested documents, we have added certain custom extensions to our SQL interface to allow for easy querying of nested documents and arrays. In addition, the ability of Infinity to perform schemaless ingestion completely eliminates the need for upfront data modeling.

The basic data primitive in Infinity is a document. A document has a set of fields and is uniquely identified by a document id. Collections allow you to manage a set of documents and can be compared to tables in the relational world.

Strong dynamic typing

Types can be static (known at compile time, or, in our case, when the query is parsed, before it starts executing) or dynamic (known only at runtime). A type system can be weak (the system tries to avoid type errors and coerces eagerly) or strong (type mismatches cause errors, coercion requires explicit cast operators). For example, MySQL has a weak, static type system. (2 + '3foo' is 5; 2 + 'foo' is 2). PostgreSQL has a strong, static type system (it doesn't even convert between integer and boolean without a cast; SELECT x FROM y WHERE 1 complains that 1 is not a boolean).

Infinity has strong dynamic typing, where the data type is associated with the value of the field in every column, rather than entire columns. This means you can execute strongly typed queries on dynamically typed data, making it easier to work with fluid datasets. This is useful not only in dynamic programming languages like Python & Ruby, but also in statically typed C++ & Java applications where any type mismatch would be a compile time error.

Cloud-Native Architecture

One of the core design principles behind Infinity is to exploit hardware elasticity in the cloud. Traditional databases built for data centers assume a fixed amount of hardware resources irrespective of the load, and then design to optimize the throughput and performance within that fixed cluster. However, with cloud economics, it costs the same to rent 100 machines for 1 hour as it does to rent 1 machine for 100 hours to do a certain amount of work. Infinity has been architected so as to optimize the price-performance ratio by aggressively exploiting the fundamentals of cloud elasticity.

Infinity's cloud-native architecture allows it to scale dynamically to make use of available cloud resources. A data request can be parallelized, and the hardware required to run it can be instantly acquired. Once the necessary tasks are scheduled and the results returned, the platform promptly sheds the hardware resources used for that request. The key components are containerized using Kubernetes for a cloud-agnostic approach.

Some of the key guiding principles behind Infinity's design include:

Use of shared storage rather than shared-nothing storage Cloud services such as Amazon S3 provide shared storage that can be simultaneously accessed from multiple nodes using well-defined APIs. Shared storage enables Infinity to decouple compute from storage and scale each independently. This ability helps us build a cloud-native system that is orders of magnitude more efficient.
Disaggregated architecture Infinity is designed to use only as much hardware as is truly needed for the workload it is serving. The cloud offers us the ability to utilize storage, compute and network independently of each other. Infinity’s services are designed to be able to tune the consumption of each of these hardware resources independently. Additionally, a software service can be composed from a set of microservices, with each microservice limited by only one type of resource in keeping with the disaggregated architecture
Resource scheduling to manage both supply and demand A traditional task scheduler typically only manages demand by scheduling task requests among a fixed set of hardware resources available. Infinity's cloud-native resource scheduler, on the other hand, can manage demand along with supply. It can request new hardware resources to be provisioned to schedule new task requests based on the workload and configured policies. Also, once done serving the request, the resource scheduler sheds the newly provisioned hardware as soon as it can to optimize for price-performance.
Separation of durability and performance Maintaining multiple replicas was the way to achieve durability in the pre-cloud systems. The downside here is, of course, the added cost for additional server capacity. However, with a cloud-native architecture, we can use the cloud object store to ensure durability without requiring additional replicas. Multiple replicas can aid query performance, but these can be brought online on demand only when there is an active query request. By using cheaper cloud object storage for durability and only spinning up compute and fast storage for replicas when needed for performance, Infinity can provide better price-performance.
Ability to leverage storage hierarchy Infinity is designed to take advantage of the range of storage hierarchy available in the cloud. It uses hierarchical storage with the help of RocksDB-Cloud, which is a high-performance embedded storage engine optimized for SSDs. A RocksDB-Cloud instance automatically places hot data in SSDs and cold data in cloud storage. The entire database storage footprint need not be resident on costly SSD. The cloud storage contains the entire database and the local storage contains only the files that are in the working set.

RocksDB-Cloud

RocksDB-Cloud is Infinity’s embedded durable storage engine. It extends the core RocksDB engine, used in production at Facebook LinkedIn, Yahoo, Netflix, Airbnb, Pinterest and Uber, to make optimal use of hierarchical storage from SSDs to cloud storage.

RocksDB-Cloud is fully compatible with RocksDB, with the additional feature of continuous and automatic replication of database data and metadata to cloud storage (e.g. Amazon S3). In the event that the RocksDB-Cloud machine dies, another process on any other EC2 machine can reopen the same RocksDB-Cloud index

.A RocksDB-Cloud instance is cloneable. RocksDB-Cloud supports a primitive called zero-copyclone that allows another instance of RocksDB-Cloud on another machine to clone an existing database. Both instances can run in parallel and they share some set of common database files.

Disaggregated Content Processing from Distributed Query Processing

Infinity employs an architecture where Content Processing is completely separate from Distributed Query Processing. Such architecture and its variants is favored by web-scale companies, like Facebook, LinkedIn, and Google, for its efficiency and scalability. The distributed query processing is a high-performance serving layer that can serve complex queries, eliminating the need for complex data pipelines.

Content Synchronization, Content Understanding, Storage and Indexing and Distributed Query subsystems run as discrete microservices in disaggregated fashion, each of which can be independently scaled up and down as needed. The system scales Content Synchronization when there is more data to ingest, scales DAGs in Content Understanding when data size grows, and scales Distributed Query Processing when the number or complexity of queries increases. This independent scalability allows the system to ingest data, run computational graphs (DAGs) and data pipelines, maintain indexes and execute queries in an efficient and cost-effective manner.

Separation of Storage and Compute

Traditionally, systems have been designed with tightly coupled storage and compute with the aim of keeping storage as close to compute for better performance. With an improvement in hardware over time, along with the onset of cloud-based architectures, this is no longer a binding constraint. Tight coupling of storage and compute makes scaling these resources independently a challenge, in turn leading to overprovisioning of resources. Separating compute and storage, on the other hand, provides benefits such as improved scalability, availability, and better price-performance ratios. This allows you to scale out your cluster to adapt to variable workloads in a shorter span of time.

While storage-compute separation has been embraced by recent cloud data warehouse offerings, it is a novel concept for application backends. Infinity is unique in enabling users to benefit from storage-compute separation when serving data-driven applications.

Decoupling of storage and compute requires the ability to store persistent data on remote storage. RocksDB-Cloud allows us to provide this separation between compute and storage. All of RocksDB's persistent data is stored in a collection of SST (sorted string table) files in cloud storage. A compaction process is performed to combine this set of SST files to generate new ones with overwritten or deleted keys purged. Compaction is a compute-intensive process. Traditionally, you would have compaction happen on the CPUs that are local to the server that hosts the storage as well. However, since the SST files are not modified once created, it allows us to separate the compaction compute from storage. This is achieved by using remote compaction, wherein the task of compaction is offloaded to a stateless RocksDB-Cloud server once a compaction request is received. These remote compaction servers can be auto-scaled based on the load on the system.

Independent Scaling Of Ingest, Content Understanding And Query Compute

A further characteristic of the Infinity architecture, enabled by its decoupled Distributed Query Processing and Content Processing tiers, is the ability to scale ingest and query compute independently of each other. This permits Infinity to handle spikes in data ingestion or query volume flexibly without overprovisioning compute resources.

In practice, a bulk load would result in more ingest compute being dynamically spun up to ensure ingest latency is minimized, while a surge in queries due to an event, for instance, can be handled by allocating more query compute to ensure low query latency

Sharding And Replication

The data in Infinity is horizontally partitioned. Each document in a collection is mapped to an abstract entity called a microshard. This mapping is performed using a microshard mapping function, which is a function of the document ID. A group of microshards comes together to form an Infinity shard. Document-based sharding makes it easier for the system to scale horizontally. Each shard maps one-to-one to an index instance, which holds the indexed data in the form of lexicographically sorted key-value pairs.

Splitting data in large collections into multiple shards allows you to leverage shard-level parallelism, where each shard can be scanned in parallel, helping serve queries on such collections faster without scans being a bottleneck. Infinity allows you to have multiple replicas of the same shard for availability.

Mutability Of The Index

Infinity is a mutable database. It allows you to run mutation operations on your video, audio, image and text objects in your database.

Several traditional columnar databases also support updates (called trickle-loading). But since millions of values from the same column may be columnar-compressed and stored in a single compact object, if you need to update a record that lies in the middle of such a compact object, you would have to do a copy-on-write for that object. Objects are typically hundreds of megabytes in size, and if updating a 100-byte record in the middle of a compact object triggers a copy-on-write cycle of hundreds of megabytes, such a system cannot sustain more than a trickle of occasional updates. Most users will prefer to add a new partition to an existing table to record the updates and then at the time of query, merge these partitions together. When performing updates in Infinity, users avoid this additional level of complexity.

Infinity's underlying storage engine, RocksDB-Cloud, is a key-value store which is designed to natively support any field to be update-able.

Schemaless Ingestion

Overview Of Data Ingestion Flow

Data can come in either via the write API or through the various data sources that Infinity supports.

When data comes in via the write API, it is routed to an API server. The API server is an HTTP server that processes REST operations and acts as the frontend to the Infinity cluster. It takes the data requests, performs permission checks, and writes this data sequentially to a distributed log store. This data is then tailed and indexed by the leaf nodes. Leaves are EC2 machines with local SSDs that hold the indexed data that are later used to serve queries.

Heavy writes can impact reads, and since we want to serve operational analytics queries well, reads need to be fast. The distributed log store acts as an intermediate staging area for the data before it is picked up by the leaves and indexed. It also provides data durability until the data is indexed by the leaves and persisted to S3.

Ingesters read the data from data sources and may optionally perform user-requested transformations on this data before writing it to the log store. The transformer service runs as part of the ingester. It lets you drop or map fields.

In the case of bulk ingest when a significantly large size of data is being ingested, the data is written to S3 instead and just the headers are written to the log store, so as not to make the log store a memory bottleneck.

Content Transformations and Content Understanding

For every ingested (video/audio/image/text) file, we pass it through a series of transformation steps. The results of these transformations are stored as fields in the document(s). These transformations are applied to incoming content on all the write paths.

Our comprehensive AI extracts key features from video such as action, object, text on screen, speech, and people. It transforms all of that information into vector representations. Vectors enable fast and scalable semantic search.

Built-in Connectors

Integrations allow you to securely connect external data sources such as DynamoDB, Kinesis, S3, Google Cloud Storage and more to your Infinity account. Infinity will then automatically sync your collections (based on the user-defined schedule) to remain up-to-date with respective data sources, usually within a matter of seconds/minutes.

Note that even if your desired data source is not currently supported, you can always use the Infinity API to create and update collections. However, you would have to manage your data syncing manually.

Write API

The Write API is a REST API that allows you to add new sources to a collection in Infinity by sending an HTTPs POST request to Infinity's write endpoint.

Apache Kafka

The Kafka connector helps you load data from Kafka into Infinity. Once users create an empty collection and configure the Infinity Kafka Connect plugin to point to their Kafka topic and Infinity collection, Kafka documents will start being written to Infinity. Only valid JSON and Avro documents can be read from Kafka and written to Infinity collections using this connector.

Amazon S3

The Amazon S3 connector lets you load data from an Amazon S3 bucket into an Infinity collection. In order to use an Amazon S3 bucket as a data source in Infinity, all you need to do is create an Amazon S3 integration to securely connect buckets in your AWS account with Infinity and then create a collection which syncs your data from an Amazon S3 bucket into Infinity.

Google Cloud Storage

This connector lets you use a Google Cloud Storage bucket as a data source in Infinity. To use Google Cloud Storage as a data source, first create a Google Cloud Storage integration to securely connect buckets in your GCP account with Infinity followed by a collection which syncs your data from a Google Cloud Storage bucket into Infinity.

Bulk Load

Ingesting bulk-sourced data requires Infinity to keep up with the rate at which data floods in. In order to allow a smooth and efficient ingest experience while ingesting bulk-sourced data, Infinity provides a separate bulk load mode.

In the normal course of operations, an ingester worker serializes a source object and writes it straight to the log store. However, if the input dataset is tens of GB large, this can stress out the log store. In bulk load mode, Infinity uses S3 as a bulk-data queue instead of the log store. Only the metadata is written to log store instead. This mode also leverages larger leaf pods to be able to configure RocksDB with larger memtables. For indexing the data, the leaves tail the messages from the log store, examine at the message header, locate the file either from the message itself or from the corresponding S3 object, and index it into RocksDB-Cloud. The bulk leaves also fully compact the data and then start moving this compacted data over to regular leaves once the collection is ready to serve queries.

Adaptive Indexing Ensemble

Infinity is built using ensemble of indexes, which is a combination of:

Inverted index
Column-based index
Row-based index
Spatiotemporal index
Vector index

As a result, it is optimized for multiple access patterns, including key-value, time-series, document, search and aggregation queries. The goal of the Indexing Ensemble is to optimize query performance, without knowing in advance what the shape of the data is or what type of queries are expected. This means that point lookups and aggregate queries both need to be extremely fast. Our P99 latency for filter queries on terabytes of data is in the low milliseconds.

The following examples demonstrate how different indexes can be used to accelerate different types of queries:

Query 1:

SELECT keyword, count(*) c
FROM search_logs
GROUP BY keyword
ORDER BY c DESC

Query 2:

SELECT *
FROM search_logs
WHERE keyword = ‘infinity’
AND locale = ‘en’

The optimizer will use the database statistics to determine this query needs to fetch a tiny fraction of the database. It will decide to answer the query with the search index.

Indexing Data In Real Time

The index is a live, real-time index that stays in sync with multiple data sources and reflects new data in less than a second. It is a covering index, meaning it does not federate back to the data source for serving any queries. This is essential for predictably delivering high performance.

Traditionally, maintaining live indexes for databases has been an expensive operation but Infinity uses a modern cloud-native approach, with hierarchical storage and a disaggregated system design to make this efficient at scale. Conceptually, our logical inverted index is separated from its physical representation as a key-value store, which is very different from other search indexing mechanisms. We also make the converged index highly space-efficient by using delta encoding between keys, and zSTD compression with dictionary encoding per file. In addition, we use Bloom filters to quickly find the keys - we use a 10-bit Bloom which gives us a 99% reduction in I/O.

Our indexes are fully mutable because each key refers to a document fragment - this means the user can update a single field in the document without triggering a reindex of the entire document. Traditional search indexes tend to suffer from reindexing storms because even if a 100-byte field in a 20K document is updated, they are forced to reindex the entire document.

Time-Series Data Optimizations

For time-series data, Infinity's rolling window compaction strategy allows you to set policies to only keep data from last “x” hours, days or weeks actively indexed and available for querying. Our time-to-live (TTL) implementation is built-in using compaction filters to automatically drop older data, which means we do not need separate I/O and additional resources to potentially run a loop that scans older data and deletes it. Traditional databases have been known to blow out the database cache and use lot of additional resources in order to support this behavior. We support sortkey behavior for event-series data based on time stamp specified or document creation time. To increase efficiency, our Converged Indexing engine stores the sortkey as part of the key-value store itself - so we do not need to retrieve all relevant documents as part of a query and then sort them in memory.

At collection creation time the user can optionally map a field as event-time.
When event-time is not explicitly specified, the document creation time will be used as event-time to determine system behavior.
If users want to specify retention for a collection (eg: automatically purge all records older than 90 days) then we will determine how old a record is based on the event-time field.
All queries will, by default, be sorted by event-time descending.
All queries that have range clauses on event-time are significantly faster than similar queries on regular fields.
All queries that include an “order by” event-time descending are significantly faster than similar queries requiring a sort on regular fields.

Query Processing

Infinity provides a full SQL interface to query the data - including filters, aggregations and joins. When a query comes in it hits the Infinity API server and gets routed to an aggregator.

Broadly speaking, a SQL query goes through 3 main stages in Infinity:

Planning
Optimization
Execution

The aggregator plans the query and routes various fragments of the plan to appropriate leaves that hold the data to serve this query. The results are routed back to this aggregator which then sends the results back to the API server. We introduce additional levels of aggregators to distribute the processing across multiple aggregators for queries that may make a single aggregator a memory/computation bottleneck.

The following subsections go over the design of the query processing architecture given our unique challenges owing to working with dynamically typed data and the emphasis on performance requirements to support millisecond-latency analytics queries.

Query Planning

The first step before the planning phase is query parsing. The parser checks the SQL query string for syntactic correctness and then converts it to an abstract syntax tree (AST). This AST is the input to the query planner.

In the planning stage, a set of steps that need to be executed to complete the query is produced. This set of steps is called a query plan. The final query plan selected for execution is called the execution plan.

Query Optimization

The job of the query optimizer is to pick an execution plan with the optimal execution runtime. Infinity uses a Cost Based Optimizer (CBO) to pick an optimal execution query plan. It starts with all possible query plans in its search space and evaluates each of them by assigning a “cost” to every plan. This “cost” is mainly a function of the time required to execute that plan. The final cost of the plan is computed by breaking the query plan into simpler sub-plans and costing each of them in turn. The cost model uses information about the underlying data, such as total document count, selectivity, and distribution of values to guide the estimation of the cost of a plan.

A recursive in-memory data structure called Memo is used to efficiently store the forest of query plan alternatives generated during query planning. Plan alternatives are generated by applying a set of normalization and exploration rules to the plan nodes.

Normalization is used mainly to simplify expressions, transform equivalent expressions to a canonical form, and apply optimizations that are believed to always be beneficial in order to save the CBO some work. We have implemented our own rule specification language (RSL) to express these normalization rules. We convert these RSL rules to C++ code snippets using our own RSL compiler.

Exploration happens as part of the query optimization stage. During this phase, the various plan alternatives are costed by costing dependent memo groups recursively, starting at a Memo’s root group. It is during this phase that the most efficient join strategy, join ordering or access path would be picked.

Distributed Query Execution

The execution plan obtained as a result of exploration is simply a DAG of execution operators. For instance, an access-path operator is responsible for fetching data from the index, e.g. ColumnScan operator which fetches data by scanning the columnar index, while an aggregation operator performs aggregation computations on the data fed to it. The execution plan needs to be further prepared for distributed execution in this phase. The execution plan is first divided into fragments. Each fragment comprises a chain of execution operators.

There are primarily 2 classes of fragments:

Leaf fragments: These are fragments of the plan that would typically be associated with retrieving data from the underlying collection. Leaf fragments are dispatched to and executed on leaf workers where shards of the collection reside.
Aggregator fragments: These are fragments of the plan that would perform operations such as aggregations and joins on the data flowing from leaf fragments, and relay the final results to the API server. They are dispatched to and executed on aggregator workers.

The leaf fragment needs to be executed on a minimal covering of the shards that comprise the collection. Each of these leaf fragments can be executed in parallel offering shard-level parallelism.

The workers to assign these fragments are picked with the goal of keeping the load distributed among all the workers across queries. If a certain worker is unavailable to process a request, the execution engine retries scheduling the fragment on a different worker, thus ensuring that the query does not fail.

Once the fragments are scheduled to the respective workers, the execution proceeds bottom up in a non-blocking manner. No operator blocks on its input. The operator receiving data is also not forced to buffer an arbitrary amount of data from the operators feeding into it. Every operator can request a variable number of data chunks/pages from its predecessor, thus providing a way to implement non-blocking back pressure. The data exchange between operators happens in the form of data chunks, which organize data in a columnar format. This also makes it feasible for the operators to execute in a vectorized manner, where operations can be performed on a set of values instead of one value at a time, for better performance wins.

Application Development

Infinity provides client SDKs that you can use to create a collection, create integrations and load documents into it, and query collections. The following client SDKs are supported:

Node.js
Python
Java
Golang

REST API

Infinity provides a REST API that allows you to create and manage all the resources in Infinity. The endpoints are only accessible via https.

Some of the operations supported via the REST API are:

Create Collection: Create new collection in a workspace.
List Collection: Retrieve all collections in an org/workspace.
Delete Collection: Delete a collection and all its documents from Infinity.
Create Workspace: Create a new workspace in your org.
List Databases: List all databases.
Delete Database: Remove a database.
Query: Make a SQL query to Infinity
Create API Key: Create a new API key for the authenticated user.
List API Keys: List all API keys for the authenticated user.
Delete API Key: Delete an API key for the authenticated user.
Create Integration: Create a new integration with Infinity.
List Integrations: List all integrations for organization.
Delete Integration: Remove an integration.

Security

Use Of Cloud Infrastructure

This is your About page. This space is a great opportunity to give a full background on who you are, what you do, and what your site has to offer. Your users are genuinely interested in learning more about you, so don’t be afraid to share personal anecdotes to create a more friendly quality.

Every website has a story, and your visitors want to hear yours. This space is a great opportunity to provide any personal details you want to share with your followers. Include interesting anecdotes and facts to keep readers engaged.

Double click on the text box to start editing your content and make sure to add all the relevant details you want site visitors to know. If you’re a business, talk about how you started and share your professional journey. Explain your core values, your commitment to customers and how you stand out from the crowd. Add a photo, gallery or video for even more engagement.

Conclusion

Infinity takes a new approach to opaque data analytics, by bringing together:

Cloud-native architecture
Infinity was built to enable users to take maximum advantage of cloud efficiencies
through a fully managed database service and independent scaling of individual
compute and storage components.
Schemaless ingestion
Ingest semi-structured, nested data from databases, data streams and data lakes
without the need for pre-defined schema. Ingest data continuously using fully managed connectors to common data sources or Infinity Write API.
Adaptive indexing ensemble
Infinity automatically indexes all ingested data in multiple ways—inverted, column-based and row-based—to accelerate queries without requiring advance knowledge of query patterns.
Full-featured SQL
Compose queries and create APIs using declarative SQL, without the need to transform your data. Use the full functionality of SQL, including joins, optimized by Infinity's query engine.

Enabling developers to build real-time applications on data quickly and simply is what we, at Infinity, work towards every day. We believe you should be bottlenecked only by your creativity and not what your data infrastructure can do. Get started building with Infinity.

Let’s Work Together

Seattle, WA

Planet Earth

E-Mail: info@infinitydb.co

Technology

Abstract

Introduction

Planet-scale analytical database with SQL queries over opaque content and deep learning at its core

Data Model

Strong dynamic typing

Cloud-Native Architecture

RocksDB-Cloud

Disaggregated Content Processing from Distributed Query Processing

Separation of Storage and Compute

Independent Scaling Of Ingest, Content Understanding And Query Compute

Sharding And Replication

Mutability Of The Index

Schemaless Ingestion

Overview Of Data Ingestion Flow

Content Transformations and Content Understanding

Built-in Connectors

Bulk Load

Adaptive Indexing Ensemble

Indexing Data In Real Time

Time-Series Data Optimizations

Query Processing

Query Planning

Query Optimization

Distributed Query Execution

Application Development

REST API

Security

Security

Use Of Cloud Infrastructure

Conclusion

Let’s Work Together

Thanks for submitting!