With one other 12 months virtually behind us, it’s time to take a seat again and contemplate what we’ve simply been via. It’s been one other lively 12 months within the massive information house, with loads of information for the intrepid massive information reader.
We’ve had an eventful final 12 months right here at Datanami, which is able to quickly full the transition to BigDATAwire (maintain your eyes out for that change in January). With that in thoughts, it’s price looking the highest tales in every of the previous 12 months. The rankings are in response to pageviews.
January: All Eyes on Snowflake and Databricks in 2022
The brand new 12 months kicked off with plenty of anticipation for what Databricks and Snowflake would do. The 2 firms didn’t disappoint, with a bunch of latest capabilities and continued robust development (though the much-anticipated Databricks IPO by no means materialized). These two information giants shall be fascinating to observe in 2023 too–though will probably be robust to cowl their respective consumer conferences in June, which happen the identical days (with Databricks in San Francisco and Snowflake in Las Vegas).
February: Snowflake, AWS Heat As much as Apache Iceberg
Apache Iceberg–the brand new open desk format that solves plenty of consistency issues in massive information lakehouses–got here on robust in late 2021, and its utilization grew via 2022. We named Ryan Blue, the co-creator of Iceberg, as one in every of our individuals to observe. Databricks, for what it’s price, introduced assist for Iceberg later within the 12 months (it additionally open sourced its Delta desk format, offering competitors to Iceberg, together with Apache Hudi).
March: House Depot Finds DIY Success with Vector Search
Vector search was one of the compelling new applied sciences to search out traction in 2022. We obtained an inside view of how the expertise (typically deployed utilizing vector databases) helped residence enchancment big House Depot supercharge its prospects’ Internet and cell searches by utilizing neural networks to deduce what they’re in search of as an alternative of a sustaining an enormous dictionary of generally misspelled phrases.
April: The Modernization of Information Engineering at Capital One
Democratization of knowledge science and information evaluation stands out as the objective, however information engineering is commonly the trail to get there. The oldsters at Capital One understand this, which is why the corporate has poured sources into information engineering to streamline entry to information. It’s inner information market combines a knowledge catalog, an automatic information pipeline improvement device, information governance, and information high quality, and it’s held along with a tremendous information mesh.
Could: Anaconda Unveils PyScript, the ‘Minecraft for Software program Improvement’
Python has change into the lingua franca for information science. That’s not information. However with Anaconda’s new PyScript, which CEO Peter Wang unveiled on the PyCon 2022 convention, the corporate helped to decrease the barrier to creating information science utility within the consolation of a Internet browser.
June: EMR Serverless Now Out there from AWS
Apache Hadoop has lengthy ceased being the middle of gravity of the massive information world. However Hadoop’s legacy lives on, together with at AWS, the place its Amazon EMR providing continues to be a smash hit amongst prospects utilizing Apache Spark, Apache Flink, Apache Hive, Presto, and even MapReduce code. And with its new serverless possibility, Amazon EMR (which used to face for Elastic MapReduce however doesn’t formally anymore) helped to remove one of many massive usability hurdles that bothered that outdated elephant Hadoop.
July: Mathematica Helps Crack Zodiac Killer’s Code
Typically, tales languish on Datanami for months earlier than readers lastly understand what they’ve lacking. Such was the case with this January 2022 story, which described how a trio of males from Virginia, Australia, and Belgium used the Mathematica statistical package deal from Wolfram to crack the Zodiac Killer’s code. Uncover Journal will get credit score for first reporting this story. Unfortunatley, the id of the Zodiac Killer, the serial killer who terrorized Northern California greater than half a century in the past, stays unresolved.
August: Datanami Individuals to Watch 2022
We first introduced the 12 Datanami Individuals to Watch again in February, and ran interviews with the group over the course of the 12 months. It’s an ideal group of leaders, together with Yu Xu (TigerGraph), Lauren Woodman (Datakind), Venkat Venkataramani (Rockset), Adam Selipsky (AWS), Matthew Scullion (Matillion), Satyen Sangani (Alation), Andrew Ng (LandingAI), Tristan Useful (dbt Labs), Susan Gregurick (NIH), Zhamak Dehghani (Thoughtworks), Pleasure Buolamwini (MIT Media Lab), and Ryan Blue (Tabular). Hold an eye fixed out in early 2023 for the following batch.
September: Walmart Offers Information and Analytics Monetization A Attempt
Because the world’s largest retailer, Walmart is aware of a factor or two about promoting. With the launch of its new Walmart Information Ventures arm earlier this 12 months, the corporate launched new choices in its Walmart Luminate line, similar to Shopper Conduct, Channel Efficiency, and Buyer Notion. The retail big isn’t solely promoting to its companions information about its retailer gross sales (2 billion market baskets per quarter, the corporate says), however promoting them prepackaged analytics insights, too.
October: Information Mesh Vs. Information Cloth: Understanding the Variations
There’s no denying it: Information materials and information meshes are scorching. There’s additionally no denying that there’s plenty of confusion round these two ideas, which share some similarities but additionally have vital variations. This text, which was revealed in October 2021, took a 12 months to change into the most-viewed story for a month, exhibiting simply how a lot demand there may be for informaiton on information meshes and information materials. It simply occurred that it took a 12 months for it to bubble as much as the highest. Count on extra curiosity on information meshes and information materials within the new 12 months.
November: What Does Information and Analytics Want for 2023? Forrester Shares Predictions
Up thus far, Datanami had one ironclad rule: No new 12 months predictions tales earlier than Thanksgiving. (It was the one strategy to maintain the PR individuals at bay.) For no matter cause, we broke the rule this 12 months after we interviewed Forrester analyst Kim Herrington and revealed her analyst group’s predictions for 2023, and the outcome was the highest grossing story for the month. Go determine.
December: UC Berkeley Launches SkyPilot to Assist Navigate Hovering Cloud Prices
One of many largest rising developments in 2022 was the rising prices of cloud computing. The oldsters working the pc science program at UC Berkeley realized this, which is why they created Sky Computing because the follow-on to RISELab (which succeeded AMPLab). Certainly one of Sky Computing’s first creations is Sky Pilot, which lets customers run batch machine studying workloads on any cloud. There’s no telling whether or not will probably be as extremely profitable as Ray, which got here out of RISELab, or Spark, which got here out of AMPLab. However contemplating the eye employees author Jaime Hampton’s story obtained, we’re not betting in opposition to it.
That’s it from us this 12 months at Datanami. Joyful holidays, and we’ll see you again right here in 2023.
Alation, Anaconda, AWS, Databricks, DataKind, dbt Labs, Forrester, LandingAI, Matillion, MIT Media Lab, Snowflake, Tabular, TigerGraph, Wolfram