Friday, September 1, 2023
HomeBig DataAaand the New NiFi Champion is…

Aaand the New NiFi Champion is…


On Might 3, 2023, Cloudera kicked off a contest known as “Greatest in Stream” for NiFi builders to compete to construct the very best knowledge pipelines. This weblog is to congratulate our winner and evaluation the highest submissions.  

On the verge of the discharge of NiFi 2.0, Cloudera VP of Engineering and NiFi founder Joe Witt, joined by principal committers Mark Payne and Matt Gillman, addressed the worldwide group by way of a digital occasion dubbed “Meet the Committers.” The group mentioned NiFi’s origins and the journey to NiFi 2.0 in addition to vital options within the upcoming launch, and surveyed the group concerning the dev/ops challenges of managing their very own nodes. As a part of the occasion, Cloudera kicked off the “Greatest in Stream” contest. The competition challenged builders to construct knowledge pipelines that symbolize their enterprise use circumstances utilizing Cloudera DataFlow. DataFlow is a cloud-native knowledge service powered by Apache NiFi with a streamlined consumer expertise for improvement and deployment enabling true common knowledge distribution. For the competition, Cloudera made a sandbox surroundings obtainable for builders to make use of DataFlow Public Cloud. We had greater than 40 builders energetic within the surroundings and plenty of high-quality contest submissions. However in the long run there might solely be one winner.

Greatest in Stream champion

So with none additional ado, our winner and the brand new Greatest in Stream Champion is:

Vince Lombardo! Vince is a Senior Infrastructure Engineer at Wells Fargo, and he developed a cybersecurity pipeline to effectively accumulate, course of, and make knowledge from an asset polling software obtainable for database ingestion. Cybersecurity is a standard area for DataFlow deployments as a result of want for well timed entry to knowledge throughout methods, instruments, and protocols. What’s attention-grabbing about Vince’s software is that it cleverly makes use of “pagination” performance to constantly distribute up-to-the minute outcomes from a software that doesn’t all the time return a full set of outcomes immediately. For extra element on the profitable move, try Vince’s github web page right here.   

Vince’s profitable move

Vince started by funneling knowledge from six API endpoints from an asset polling software containing cybersecurity and tech ops knowledge into two discrete knowledge matters. The move he constructed differentiates between check or true API name earlier than initiating a safe log in. The sensible half comes subsequent. As a result of the polling software can take time to return queries, Vince added a processor to loop till the question completes, returning question standing till the question is full. Completeness is estimated by evaluating a check outcome with “estimated complete.” When a close to match is detected, the information pull is triggered after which checked once more for completeness earlier than being remodeled into rows and columns and merged right into a batch for database ingestion.

Determine 1: The a part of the move that loops till the Tanium question has accomplished

Vince’s move met all of our standards and was the clear contest winner. This move is full and adheres to NiFi finest practices being each environment friendly and extremely safe. By using pagination, this dataflow ensures a whole outcome set is available from an information supply with extremely variable question execution occasions. It’s deployable, has clear enterprise worth, and serves as an important instance of common knowledge distribution in motion. Congratulations Vince!  

Runner up

Ramakrishna Sanikommu was our runner up. His submission publish may be discovered right here. RK constructed some easy flows to tug streaming knowledge into Google Cloud Storage and Snowflake.  Many builders use DataFlow to filter/enrich streams and ingest into cloud knowledge lakes and warehouses the place the power to course of and route wherever makes DataFlow very efficient.  RK constructed a number of flows shortly, first pulling a number of knowledge sources from a Google Pub/Sub matter and merging them right into a file for ingestion into GCS. He then constructed a second move to execute a Python script and cargo the information into Snowflake. His flows adhered to finest practices and demonstrated some gentle transformations. RK correctly used the DataViewer as effectively to view contents of a queue.

Determine 2: Ramakrishna’s first move consuming knowledge from Google PubSub and ingesting it into Google Cloud Storage

 

Determine 3: Ramakrishna’s second move studying knowledge from Google Cloud Storage and ingesting it into Snowflake

Abstract and searching forward

In lower than 10 years since its inception, NiFi has achieved completely huge scale each when it comes to recognition and the measurement of deployments. NiFi’s origins, nevertheless, had been fairly easyfor any two methods to work collectively, there are fairly a number of issues that should agree. They have to not solely converse some widespread knowledge language however account for myriad issues like relevance, safety, precedence, authorization, and so forth. NiFi was constructed as a form of Swiss Military Knife to shortly join completely different methods and coordinate dataflows from one to a different utilizing an intuitive no-code improvement canvas.  

Since buying the corporate primarily answerable for sustaining the NiFi code base in 2015, Cloudera has continued to pour sources into the Open Supply challenge, which now boasts greater than 500 contributors throughout the globe and hundreds of energetic group members in Slack. NiFi has developed significantly, staying forward of safety vulnerabilities and including connectors with releases each quarter. The “Greatest in Stream” contest was a substantial amount of enjoyable, and demonstrated the urge for food for group round Apache NiFi. Right here at Cloudera we’re excited to host future occasions for NiFi builders, so keep tuned to seek out out what’s subsequent. To check drive Cloudera DataFlow your self, click on right here to request a trial of Cloudera Knowledge Platform within the Public Cloud.  https://www.cloudera.com/marketing campaign/try-cdp-public-cloud.html 

Sources



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments