It’s a common tactic to combine two technologies for synergy sake, but Lentiq really has a unique idea. It is combining the concept of the data lake with edge computing into what it calls “interconnected micro data lakes,” or data pools.
“Data pools” are micro-data lakes that function like a data lake while supporting popular apps such as Apache Spark, Apache Kafka, and Streamsets software, or “everything a data scientist or data engineer needs,” according to the company.
The data pools exist independently across different clouds, and governance rules are enforced only when the data moves, so each department will have the tools needed for their use cases and access to the data they need.
Lentiq says customers can build Data Lake as a Service with EdgeLake in fewer than 10 minutes, and the service comes with data and metadata management, application management, notebook sharing, data sharing, infrastructure, and budget management.
“Lentiq EdgeLake’s goal is to allow as many users as possible inside an organization to access data and to offer the environment where one can perform analytics and machine learning in a friendly manner. We strongly believe transformative innovation can only be achieved through a human-centric machine learning approach for all data projects,” the company said in a blog post announcing the product.
On the surface the idea seems totally contradictory because data lakes are central repositories. Data lakes are a newer take on mass repositories like we first saw with data warehouses, except they operate very differently.
For starters, a data lake holds unstructured data, like images, PDFs, audio, logs, and so forth. Data warehouses are highly structured row and column data. Second, a data lake does not require special hardware or software, unlike a data warehouse. You can use any device that supports a flat file system, even a mainframe if you want.
The big difference, though, is that in a data warehouse, you process the data before it goes into storage. With a data lake, you fill it with whatever and process it later when you need it.
And that’s where it flies in the face of the edge. The edge is supposed to act as a filter for unnecessary data. An edge system getting car data, for instance, doesn’t want sensor readings saying everything is normal, it wants the unusual or aberrant. That is what gets sent up to the main data center. And that's how a data warehouse operates.
So, it will be interesting to see how far and wide they push this. I think they are using the term “edge” because it’s the hot buzzword, when they really are targeting departments and remote locations/offices.
Thanks to Andy Patrizio (see source)