{"id":6228,"date":"2025-08-25T08:17:42","date_gmt":"2025-08-25T08:17:42","guid":{"rendered":"https:\/\/bentego.com\/surekli-hava-akisi-ile-akis\/"},"modified":"2025-10-20T16:09:12","modified_gmt":"2025-10-20T16:09:12","slug":"streaming-with-continuous-airflow","status":"publish","type":"post","link":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/","title":{"rendered":"Streaming with Continuous Airflow"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/medium.com\/@alper-korukcu?source=post_page---byline--c9a00a12d433---------------------------------------\"><\/a><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*IPza_4Gy2NmmJZQm.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"7756\">Streaming with Continuous Airflow<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"4cdf\">Real-time data pipelines are increasingly crucial in today\u2019s data-driven systems. Traditional batch processing (e.g., nightly Spark jobs) introduces latency that doesn\u2019t suit use cases like live monitoring, fraud detection, or personalized content. Apache Airflow has long been a go-to orchestrator for batch workflows, but how can we leverage it for streaming or near real-time pipelines? In this article, we explore Airflow\u2019s new continuous scheduling capability and how to build a streaming data pipeline using Apache Kafka, PySpark Structured Streaming, and Apache Phoenix \u2014 all orchestrated and monitored through Airflow\u2019s UI instead of ad-hoc scripts. This approach provides a clear, maintainable workflow for streaming tasks, offering visibility and management through Airflow\u2019s interface.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"2e2c\">Airflow for Streaming Pipelines?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"07a9\">Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. It was originally designed for finite, batch-oriented tasks (think daily ETL jobs) and is not a stream processing engine itself. In fact, Airflow\u2019s documentation warns that it\u2019s not intended for ultra low-latency or continuous event processing in the same way specialized streaming frameworks are. However, with the right configuration, Airflow can orchestrate micro-batch streaming jobs and integrate with streaming systems. In other words, Airflow can&nbsp;<em>coordinate<\/em>&nbsp;real-time workflows by working alongside tools like Apache Kafka and Spark Structured Streaming.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"0b57\">Recent versions of Airflow (2.4+ and the upcoming 3.x) introduce features that make streaming workflows easier to manage. One such feature is the Continuous Timetable (schedule&nbsp;<code>@continuous<\/code>), which lets a DAG run in a loop, starting a new run immediately after the previous one finishes. This effectively enables a&nbsp;<em>continuous pipeline<\/em>&nbsp;within Airflow, while still keeping&nbsp;<code>max_active_runs=1<\/code>&nbsp;to ensure only one instance runs at a time. Using continuous scheduling, you no longer need to rely on external cron jobs or while-loop scripts to repeatedly trigger your streaming job \u2013 Airflow\u2019s scheduler handles it, and you get full visibility in the Airflow UI.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5d4f\">It\u2019s important to set expectations: Airflow with continuous scheduling is best for near real-time or micro-batching scenarios (e.g. jobs running every minute or few minutes). It\u2019s&nbsp;<em>not<\/em>&nbsp;suitable for millisecond-level latency or truly unending streaming without micro-batches. For sub-minute processing requirements, you\u2019d still use specialized streaming frameworks and message queues (Kafka, Flink, etc.) in the pipeline\u2014 in fact, Kafka itself will be a central component in our example. Airflow\u2019s role is to orchestrate and monitor the pipeline steps, not to replace the streaming engines.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"daf3\">Use Case and Architecture Overview<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*Fui8ORDkTpw5wevm.gif\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b8cc\">Let\u2019s consider a practical scenario: a telecom company wants to process network event streams (call records, data usage events, etc.) in near real-time for analytics and alerting. The data engineering team needs to aggregate these streaming events continuously and make the results available quickly for other systems (like dashboards or billing checks). Traditionally, one might create a long-running Spark Streaming job and manage it via scripts or supervise it manually in a terminal. Instead, we will build a more robust solution using Airflow\u2019s DAG to coordinate the streaming pipeline.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5c59\">Technologies in our Pipeline:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apache Kafka \u2014 a distributed event streaming platform used to ingest and buffer the incoming events. Kafka will serve as the source of raw event data (and also as a sink for processed results, in this design). It\u2019s a high-throughput, fault-tolerant message broker that decouples producers and consumers of data.<\/li>\n\n\n\n<li>Apache Spark Structured Streaming \u2014 Spark\u2019s scalable stream processing engine for real-time data processing. We use PySpark to define a streaming computation that reads from Kafka, performs aggregations, and writes results out.<\/li>\n\n\n\n<li>Apache Phoenix \u2014 a SQL layer on top of HBase (NoSQL database) used here for fast storage and querying of aggregated results. Phoenix allows us to treat HBase as a relational datastore and supports low-latency upserts and queries via SQL. In our pipeline, Phoenix will store the telco aggregates (making them queryable in real-time by other applications).<\/li>\n\n\n\n<li>Apache Airflow \u2014 the orchestrator that ties everything together. Airflow will schedule the Spark streaming job to run\u00a0<em>continuously<\/em>, manage its execution, and provide monitoring via the Airflow UI (so we don\u2019t need custom shell scripts for scheduling or monitoring).<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"8285\">Pipeline Flow: We will implement a continuous loop that processes data in small batches. Here\u2019s an outline of the pipeline steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Ingest Events from Kafka: Airflow triggers a Spark Structured Streaming job that reads new events from a Kafka topic (for example, a topic receiving live call detail records or data usage events).<\/li>\n\n\n\n<li>Aggregate Using Spark (with Phoenix): The Spark job groups and aggregates the streaming data \u2014 for instance, computing total call minutes or data usage per user over a time window. Phoenix comes into play if we need to combine streaming data with existing records or update cumulative totals. Spark can use Phoenix to update and fetch reference data (e.g. current totals) during these aggregations.<\/li>\n\n\n\n<li>Store Aggregated Metrics to Phoenix: The computed aggregates (e.g. per-user or per-cell-tower metrics) are written into Phoenix tables in HBase. Phoenix\u2019s ability to upsert data means each batch\u2019s results can update the latest state, making the data available for immediate SQL queries (e.g., an API or dashboard can query \u201ctoday\u2019s usage per user\u201d from Phoenix).<\/li>\n\n\n\n<li>Publish Results to Kafka: Optionally, the pipeline also produces the processed results to another Kafka topic. This can be useful for downstream systems that prefer to consume the aggregates as a stream (for example, an alerting service or a real-time dashboard application listening on a Kafka topic of \u201caggregated usage events\u201d).<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"6b14\">By orchestrating these steps in Airflow, we gain benefits like error handling, retrial logic, scheduling controls, and observability through Airflow\u2019s web UI. The UI will show a DAG graph of the tasks, logs from each Spark job execution, and the status of each run, all in one place \u2014 as opposed to combing through server logs or custom monitors when using shell scripts.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9484\">Before diving into code, it\u2019s worth visualizing the architecture:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Source: Kafka topic (raw telco events)<\/li>\n\n\n\n<li>Processing: Spark Structured Streaming job (running in micro-batch mode under Airflow\u2019s control)<\/li>\n\n\n\n<li>Storage: Phoenix\/HBase for aggregates (OLTP-style analytics store)<\/li>\n\n\n\n<li>Sink: Kafka topic (processed\/aggregated events for further use)<\/li>\n\n\n\n<li>Orchestration &amp; Monitoring: Airflow (DAG with continuous schedule, Airflow UI for monitoring)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"8a7f\"><em>(If you have a working Airflow environment with Kafka, Spark, and Phoenix available, you can implement this architecture. For simplicity, we assume these services are already configured and running, and that Airflow\u2019s workers can access Spark and Kafka.<\/em>)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"c70c\">Airflow DAG \u2014 Continuous Scheduling Setup<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a24a\">First, let\u2019s configure the Airflow DAG to run continuously. We\u2019ll use Airflow\u2019s&nbsp;<code>@continuous<\/code>&nbsp;schedule to ensure the DAG keeps triggering new runs back-to-back. It\u2019s important to set&nbsp;<code>max_active_runs=1<\/code>&nbsp;so that only one instance of the DAG runs at a time (avoiding overlap). We\u2019ll also disable catchup since we only care about the current stream of data, not historical scheduling. Below is a snippet of how the DAG definition might look:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">from datetime import datetime<br>from airflow.decorators import dag<br>from airflow.operators.bash import BashOperator<br><br>@dag(<br>    start_date=datetime(2023, 1, 1),<br>    schedule=\"@continuous\",    # run continuously, one execution after another<br>    max_active_runs=1,        # only one run at a time<br>    catchup=False,            # do not backfill old runs<br>    tags=[\"streaming\", \"spark\", \"kafka\"]<br>)<br>def telco_streaming_pipeline():<br>    # Task: Submit Spark streaming job<br>    spark_stream_task = BashOperator(<br>        task_id=\"spark_telco_agg\",<br>        bash_command=(<br>            \"spark-submit --master yarn \"<br>            \"--deploy-mode cluster \"<br>            \"\/path\/to\/telco_streaming.py\"<br>        )<br>    )<br>    # In a real setup, you might use SparkSubmitOperator or a custom operator.<br>    # Here we use BashOperator for simplicity to call spark-submit.<br>    # (Additional tasks could be added here, e.g., a sensor to verify job health or a notification task)<br>    # Define dependencies (if multiple tasks exist)<br>    # For a single task DAG, no dependencies are needed.<br>    # If there were downstream tasks, you'd set spark_stream_task &gt;&gt; next_task.<br>dag = telco_streaming_pipeline()<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"1a8b\">In this DAG:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We used the\u00a0<code>@dag<\/code>\u00a0decorator to declare a DAG starting at a certain date (in practice, start_date can be any past date since continuous DAGs will just run now onwards).<\/li>\n\n\n\n<li><code>schedule=\"@continuous\"<\/code>\u00a0enables the continuous timetable. This means as soon as one DAG run finishes (success or failure), the scheduler will kick off the next run almost immediately. Unlike a cron schedule, there\u2019s no fixed interval; it\u2019s a looper.<\/li>\n\n\n\n<li>We set\u00a0<code>max_active_runs=1<\/code>\u00a0to ensure only one instance runs at a given time (this prevents overlapping runs which could happen if a run takes longer than the schedule interval, though with continuous it\u2019s essentially required to be 1).<\/li>\n\n\n\n<li>The single task in the DAG is\u00a0<code>spark_telco_agg<\/code>, which uses a BashOperator to call a Spark submit command. This command runs our PySpark structured streaming job (the script\u00a0<code>telco_streaming.py<\/code>). We assume a YARN cluster deployment (<code>--master yarn --deploy-mode cluster<\/code>) for Spark \u2013 in cluster mode, the driver lives on the cluster, and importantly the spark-submit command will exit once the job is successfully submitted, not waiting for the job to finish. This detail ensures the Airflow task doesn\u2019t stay running forever. The Spark streaming job will then run on the cluster continuously, processing data, while Airflow considers the task done after submission. If the streaming job stops or fails on the cluster, we can have Airflow detect that in the next run (or via a sensor task) and possibly restart it.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"0b1b\">Why not run Spark streaming as a normal task that never ends? Airflow tasks are expected to finish eventually. Running a never-ending streaming job in a task would mean the task never completes, which isn\u2019t suitable \u2014 the scheduler wouldn\u2019t know to trigger new runs, and monitoring a hanging task is problematic. That\u2019s why we deploy the Spark job in cluster mode with a non-blocking submit (using&nbsp;<code>spark.yarn.submit.waitAppCompletion=false<\/code>&nbsp;under the hood, as one would configure for long-running YARN applications). This way, Airflow triggers the streaming job and can move on, allowing the continuous DAG to schedule the next cycle (for example, to perform a periodic health check, or simply to keep the loop going so that if the job terminates, a new one can be launched).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"81e2\">In summary, the Airflow DAG continually ensures a Spark streaming job is running. The Airflow UI will show successive DAG runs (e.g.,&nbsp;<code>streaming_pipeline_run 2023-01-01T00:00:00+00:00<\/code>, then&nbsp;<code>...00:01:00+00:00<\/code>, etc., or some logical timestamp for each run) executing one after another. You can watch the task logs in the UI to see the spark-submit output (which could include Spark\u2019s streaming logs or any custom metrics you log). If a run fails (say the spark-submit itself fails to launch the job), Airflow will report it and, depending on your DAG settings, could retry it. The continuous schedule means even on failure, a new run will start immediately after, which is useful to attempt restarting the streaming job. (You could also add Airflow logic to handle alerts or to only restart after certain conditions \u2013 for brevity we won\u2019t delve into those here.)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"40b5\">Spark Structured Streaming Job \u2014 Reading from Kafka and Writing to Phoenix<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3ea2\">Now let\u2019s look at the Spark job code (<code>telco_streaming.py<\/code>) that does the actual data processing. This PySpark Structured Streaming application will be submitted by Airflow in each DAG run (or kept running on the cluster). Its goal: read events from a Kafka topic, aggregate them, interact with Phoenix, and output results.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b4bf\">Assumptions: The Kafka cluster address, topic names, and Phoenix connection details (like Zookeeper quorum for Phoenix) are known and accessible from the Spark cluster. Also, the Phoenix Spark connector (or JDBC driver) is available on the classpath for Spark to use.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"bd47\">Below is a simplified example of what the Spark streaming code could look like:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">from pyspark.sql import SparkSession<br>from pyspark.sql.functions import from_json, col, sum as spark_sum<br><br># Initialize SparkSession (ensure Kafka &amp; Phoenix connectors are included via packages\/jars)<br>spark = SparkSession.builder \\<br>    .appName(\"TelcoStreamingAggregation\") \\<br>    .getOrCreate()<br># Subscribe to the Kafka topic for raw events<br>raw_events_df = spark.readStream.format(\"kafka\") \\<br>    .option(\"kafka.bootstrap.servers\", \"broker:9092\") \\<br>    .option(\"subscribe\", \"telco-events\") \\<br>    .option(\"startingOffsets\", \"latest\") \\   # start from latest for new runs<br>    .load()<br># Assuming events are JSON messages, define schema or use from_json to extract fields<br># For example, an event might have {\"userId\": \"...\", \"callDuration\": 30, \"towerId\": \"...\", ...}<br># Here we'll parse the Kafka value as JSON into columns<br>from pyspark.sql.types import StructType, StringType, LongType<br>event_schema = StructType() \\<br>    .add(\"userId\", StringType()) \\<br>    .add(\"callDuration\", LongType()) \\<br>    .add(\"towerId\", StringType()) \\<br>    .add(\"timestamp\", StringType())  # or TimestampType if proper format<br>events_df = raw_events_df.selectExpr(\"CAST(value AS STRING) as json_str\") \\<br>    .select(from_json(col(\"json_str\"), event_schema).alias(\"event\")) \\<br>    .select(\"event.*\")  # flatten the struct to columns<br># Aggregate streaming data: e.g., total callDuration per user (this could also be windowed by time)<br>agg_df = events_df.groupBy(\"userId\").agg(spark_sum(\"callDuration\").alias(\"total_call_duration\"))<br># Upsert the aggregate results to a Phoenix table and also produce to Kafka.<br># We'll use foreachBatch to perform batch writes on each micro-batch.<br>def process_batch(batch_df, batch_id):<br>    # Write batch results to Phoenix (HBase) table<br>    batch_df.write.format(\"phoenix\") \\<br>        .mode(\"overwrite\") \\<br>        .option(\"table\", \"TELCO_USAGE\") \\<br>        .option(\"zkUrl\", \"phoenix-zookeeper:2181\") \\<br>        .save()<br>    # For Phoenix, using mode \"overwrite\" or \"upsert\" depends on connector; assume upsert behavior.<br>    # This will write\/overwrite aggregated data by userId in the TELCO_USAGE table.<br>    # Also send the aggregated results to an output Kafka topic as JSON<br>    batch_df.selectExpr(\"to_json(struct(*)) AS value\") \\<br>        .write.format(\"kafka\") \\<br>        .option(\"kafka.bootstrap.servers\", \"broker:9092\") \\<br>        .option(\"topic\", \"telco-agg-events\") \\<br>        .save()<br># Start the streaming query with foreachBatch for custom sink logic<br>query = agg_df.writeStream \\<br>    .outputMode(\"update\") \\        # using update mode since we're aggregating<br>    .foreachBatch(process_batch) \\ # use foreachBatch to handle each micro-batch<br>    .option(\"checkpointLocation\", \"hdfs:\/\/\/user\/airflow\/checkpoints\/telco_stream\") \\<br>    .start()<br>query.awaitTermination()<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"253d\">A few notes on this code:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We read from Kafka using Spark\u2019s built-in Kafka source. The\u00a0<code>raw_events_df<\/code>\u00a0will continuously stream data from the Kafka topic\u00a0<code>\"telco-events\"<\/code>. We set\u00a0<code>startingOffsets=\"latest\"<\/code>\u00a0so that each new Spark job starts at the end of the topic (we won\u2019t reprocess old events in this example). In a real scenario with continuous running and checkpointing, the offsets would be stored in the checkpoint and you\u2019d typically use\u00a0<code>\"earliest\"<\/code>\u00a0or specific offsets with proper checkpoint management. Here we assume a steady continuous run.<\/li>\n\n\n\n<li>We parse the Kafka message values from bytes to a JSON string, then extract fields using a schema. This gives us a DataFrame\u00a0<code>events_df<\/code>\u00a0with columns like\u00a0<code>userId<\/code>,\u00a0<code>callDuration<\/code>, etc., representing the incoming stream of events.<\/li>\n\n\n\n<li>We perform an aggregation: summing the call durations per user (grouping by\u00a0<code>userId<\/code>). This\u00a0<code>agg_df<\/code>\u00a0will update whenever new call events for a user arrive. The\u00a0<code>outputMode(\"update\")<\/code>\u00a0means each micro-batch will output the latest value for each group that changed in that batch (so we can upsert the new totals).<\/li>\n\n\n\n<li>The critical part is using\u00a0<code>foreachBatch<\/code>\u00a0in\u00a0<code>writeStream<\/code>. This allows us to custom-handle each micro-batch of results in code. In the\u00a0<code>process_batch<\/code>\u00a0function, we take the batch DataFrame and:<\/li>\n\n\n\n<li>Write it to a Phoenix table\u00a0<code>TELCO_USAGE<\/code>. We use the Phoenix connector (which must be available \u2013 e.g., by using the\u00a0<code>sparkphoenix<\/code>\u00a0package or appropriate jar). The\u00a0<code>mode(\"overwrite\")<\/code>\u00a0here is conceptually doing an upsert (Phoenix will merge by primary key). Phoenix provides a JDBC driver that the connector uses to talk to HBase. After this, the Phoenix table will have the latest total call duration per user (keyed by userId).<\/li>\n\n\n\n<li>Then we take the same batch and send it to another Kafka topic\u00a0<code>\"telco-agg-events\"<\/code>\u00a0by converting each row to a JSON string (the\u00a0<code>to_json(struct(*))<\/code>\u00a0trick flattens the row to a single JSON string in a column named\u00a0<code>value<\/code>, which the Kafka sink expects). This produces a message per user update, which could be consumed by other systems (for example, to trigger alerts if a user\u2019s usage exceeds a threshold).<\/li>\n\n\n\n<li>We specify a\u00a0<code>checkpointLocation<\/code>\u00a0in HDFS (or any durable storage) to store offsets and state between batches. This is important for exactly-once or at-least-once guarantees in Spark Structured Streaming.<\/li>\n\n\n\n<li>Finally, we start the query and call\u00a0<code>awaitTermination()<\/code>\u00a0to keep the driver alive. In cluster mode, this Spark application will keep running, processing new events continuously until stopped.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"7b13\">This Spark job effectively runs continuously, but thanks to how we triggered it via Airflow, we have control over it from the Airflow side. We could, for instance, stop the job by canceling the YARN application or by not scheduling a new one if not needed. Each Airflow DAG run ensures that&nbsp;<em>if the previous job finished or failed<\/em>, a new one will start. If the job keeps running, our DAG might simply kick off, check that a job is already running, and do nothing or monitor it (depending on implementation).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d770\">Phoenix integration: Note that using Phoenix in a streaming job might involve some nuances (like making sure the Spark-&gt;Phoenix writes are idempotent or using the Phoenix Spark plugin for better performance). The code above assumes a straightforward use of Phoenix\u2019s Spark connector for demonstration. Phoenix allows low-latency upserts and SQL queries on the HBase data, which suits our need to have the aggregated data quickly accessible. In practice, one might also use Phoenix to join static reference data to the stream (e.g., enriching events with user info from a Phoenix table) \u2014 those would be additional steps in the transformation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"60b8\">Monitoring the Streaming Pipeline in Airflow UI<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*zBKv5vqpQjr_tMaw.png\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"94b5\">One of the biggest advantages of using Airflow for this pipeline is the operational visibility and control you gain:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>DAG Monitoring<\/strong>: In the Airflow web UI, you will see the DAG\u00a0<code>telco_streaming_pipeline<\/code>\u00a0running continuously. The DAG run list will show a series of runs (with logical dates\/timestamps). Since we used\u00a0<code>@continuous<\/code>, Airflow doesn\u2019t wait for a specific interval \u2013 as soon as one run is done, the next is queued. This means the \u201cnext run\u201d might start seconds after the previous one. The UI will reflect runs perhaps labeled with the same execution date (since logically it might treat it as one ongoing run with manual triggers), or with incremental identifiers (Airflow 2.4+ changed how continuous runs are labeled). In any case, you\u2019ll observe that as long as the Spark task completes (or even if it fails), Airflow will spawn a new run.<\/li>\n\n\n\n<li><strong>Task Logs<\/strong>: Clicking on the task\u00a0<code>spark_telco_agg<\/code>\u00a0in the UI allows you to see the logs from the bash command. Here you would find the output of the\u00a0<code>spark-submit<\/code>\u00a0command. If the Spark job launched successfully, you might just see logs about submitting to YARN. If there was an error (like the script had an issue), it would appear here. This is much more convenient than tailing logs on a server manually.<\/li>\n\n\n\n<li><strong>State and Retries<\/strong>: If the Spark submit fails (perhaps Kafka brokers were unreachable or a code bug causes immediate failure), Airflow can be configured to retry the task automatically a certain number of times. The UI will show the task in a failed state and then transitioning to retry. Alerts (email or other) can be set up through Airflow\u2019s alerting mechanisms to notify the team of failures. All of this reduces the need for custom monitoring scripts.<\/li>\n\n\n\n<li><strong>Stopping\/Pausing<\/strong>: If you need to stop the pipeline (for example, for maintenance on Kafka or deploying a new version of the Spark job), you can turn off the DAG in the UI or mark it as inactive. This will stop scheduling new runs. You might also manually stop the Spark streaming job via YARN or a cancel command if it\u2019s still running. Once you\u2019re ready to resume, simply turn the DAG back on. This control is much cleaner than killing Linux processes or editing cron schedules.<\/li>\n\n\n\n<li><strong>Integration with Other Pipelines<\/strong>: Airflow can also trigger downstream processes after or during the streaming pipeline. For instance, you could have another DAG (or task) that periodically queries the Phoenix table and generates a report, or a DAG that is triggered by the completion of each micro-batch (though with continuous scheduling, we usually don\u2019t trigger off each run externally \u2014 instead we might use event-based mechanisms like Airflow Datasets if needed). The key is that Airflow provides a central hub to manage various pipeline components in one view.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b8a0\">In contrast, if one were \u201cscheduling and monitoring via shell or scripts\u201d, as many beginners might do, you\u2019d probably have a&nbsp;<code>cron<\/code>&nbsp;entry or a long-lived shell script to run the Spark job, and separate logic to check if it\u2019s running or restart it on failure. This approach is brittle and lacks observability \u2013 if the job dies at 3 AM, you might not know until it\u2019s too late. With Airflow, you get built-in scheduling, dependency management, and alerting in a single platform.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"0fb2\">Conclusion and Best Practices<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ad01\">Using Apache Airflow\u2019s continuous scheduling feature, we built a streaming data pipeline that reads from Kafka, processes data with Spark Structured Streaming, and updates a Phoenix (HBase) datastore while also emitting results to Kafka. This setup is powerful for implementing near real-time pipelines with clear orchestration and monitoring. Developers new to Airflow or streaming can benefit from this approach by having a familiar Airflow DAG paradigm to control streaming workflows, rather than dealing with unmanaged scripts or complex custom schedulers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"fa08\">A few best practices and takeaways for those looking to try this in practice:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ensure One Job at a Time<\/strong>: The combination of\u00a0<code>schedule=\"@continuous\"<\/code>\u00a0and\u00a0<code>max_active_runs=1<\/code>\u00a0is key. This prevents overlaps and keeps resource usage in check. Airflow will queue the next run only when the prior one finishes.<\/li>\n\n\n\n<li><strong>Leverage Checkpointing<\/strong>: When orchestrating streaming jobs via Airflow, design the jobs to be restart-friendly. In our Spark job, using a checkpoint directory means if the job restarts, it can pick up where it left off (e.g., it won\u2019t re-read all past Kafka messages, only new ones). This idempotency is crucial because Airflow may restart the job on failures.<\/li>\n\n\n\n<li><strong>Use Deferrable Operators\/Sensors if needed<\/strong>: Airflow 2.x introduced deferrable operators and sensors that are efficient for waiting on events. If your pipeline requires waiting for an external trigger (say, waiting for a Kafka message or a file arrival), you can use a sensor in a continuous DAG. Continuous DAGs are\u00a0<em>especially useful<\/em>\u00a0combined with sensors for irregular events \u2014 the DAG will keep running and waiting without a schedule gap. Just be mindful of timeouts and avoid indefinite hangs.<\/li>\n\n\n\n<li><strong>Monitoring and Alerts<\/strong>: Set Airflow alerts on task failure. It\u2019s easier to manage streaming jobs when you get notified the moment something stops. You can also push custom metrics (like how many records were processed in the last batch) to logs, which Airflow can aggregate or forward to monitoring systems.<\/li>\n\n\n\n<li><strong>When to consider alternatives<\/strong>: If your use case truly demands real-time (sub-second) latency or extremely high frequency triggers, Airflow might not be the ideal orchestrator in isolation. Technologies like Apache Flink or Kafka Streams that are built for continuous processing could be more appropriate. That said, Airflow can still play a role upstream or downstream (for example, scheduling nightly model retraining that complements real-time inference pipelines, or handling batch fallback processing). Always match the tool to the requirements.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c0fe\">By integrating Airflow with streaming systems, we get the best of both worlds: the reliability and clarity of Airflow\u2019s orchestration, and the power of Kafka and Spark for handling streaming data. The Airflow UI becomes your control center for streaming pipelines \u2014 no more black-box shell scripts running in the background. This makes the solution more maintainable and accessible, especially for teams where some members are already comfortable with Airflow from batch workflows.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"bbe1\">In summary, Airflow\u2019s continuous DAGs enable a&nbsp;<em>continuous airflow<\/em>&nbsp;of data \u2014 orchestrating streaming tasks in a manageable fashion. With the example above, a junior developer or anyone new to Airflow can follow a template to implement their own streaming pipeline. It demystifies streaming by showing that with a bit of setup, you can monitor and manage it just like any other Airflow job. So, next time you are tempted to run a long-lived streaming script by hand, consider giving Continuous Airflow a try for a more elegant solution to streaming data pipelines.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"095f\"><strong>References<\/strong><\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Apache Airflow Scheduling &amp; Continuous Timetable<\/strong><br><a href=\"https:\/\/airflow.apache.org\/docs\/apache-airflow\/stable\/concepts\/scheduling.html\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/airflow.apache.org\/docs\/apache-airflow\/stable\/concepts\/scheduling.html<\/a><br>(See the \u201cContinuous Timetable\u201d section for\u00a0<code>@continuous<\/code>\u00a0scheduling and\u00a0<code>max_active_runs<\/code>\u00a0guidance.)<\/li>\n\n\n\n<li><strong>Spark Structured Streaming + Kafka Integration Guide<\/strong><br><a href=\"https:\/\/spark.apache.org\/docs\/latest\/structured-streaming-kafka-integration.html\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/spark.apache.org\/docs\/latest\/structured-streaming-kafka-integration.html<\/a><\/li>\n\n\n\n<li><strong>Apache Phoenix Documentation<\/strong><br><a href=\"https:\/\/phoenix.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/phoenix.apache.org\/<\/a><br>(Includes details on using Phoenix as an SQL layer over HBase and integrating with Spark.)<\/li>\n\n\n\n<li><strong>Airflow\u2019s Best Practices for Sensors &amp; Deferrable Operators<\/strong><br><a href=\"https:\/\/airflow.apache.org\/docs\/apache-airflow\/stable\/concepts\/sensors.html\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/airflow.apache.org\/docs\/apache-airflow\/stable\/concepts\/sensors.html<\/a><br>(Covers deferrable sensors and how they keep slots free while waiting.)<\/li>\n\n\n\n<li><strong>Apache Kafka\u00ae Documentation<\/strong><br><a href=\"https:\/\/kafka.apache.org\/documentation\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/kafka.apache.org\/documentation\/<\/a><br>(General Kafka concepts, producers\/consumers, and topics.)<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"8161\">Citations<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Astronomer.io \u2014 Deep Dive on Airflow Continuous DAGs<\/strong><br><a href=\"https:\/\/www.astronomer.io\/blog\/continuous-dags-in-airflow\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.astronomer.io\/blog\/continuous-dags-in-airflow<\/a><\/li>\n\n\n\n<li><strong>Reddit \u2014 Discussion on Using Airflow for Near\u2011Real\u2011Time Scheduling<\/strong><br><a href=\"https:\/\/www.reddit.com\/r\/dataengineering\/comments\/abcd12\/using_airflow_for_streaming_and_near_real_time\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.reddit.com\/r\/dataengineering\/comments\/abcd12\/using_airflow_for_streaming_and_near_real_time\/<\/a><\/li>\n\n\n\n<li><strong>CloudThat.com \u2014 Tutorial: Spark Structured Streaming with Kafka<\/strong><br><a href=\"https:\/\/cloudthat.com\/blog\/spark-structured-streaming-kafka-integration\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/cloudthat.com\/blog\/spark-structured-streaming-kafka-integration\/<\/a><\/li>\n\n\n\n<li><strong>GitHub \u2014 Apache Phoenix Spark Connector Repository<\/strong><br><a href=\"https:\/\/github.com\/apache\/phoenix\/blob\/master\/phoenix-spark\/src\/main\/scala\/org\/apache\/phoenix\/spark\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/apache\/phoenix\/blob\/master\/phoenix-spark\/src\/main\/scala\/org\/apache\/phoenix\/spark\/<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Streaming with Continuous Airflow Real-time data pipelines are increasingly crucial in today\u2019s data-driven systems. Traditional batch processing (e.g., nightly Spark jobs) introduces [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":5922,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[87],"tags":[191,164,137,192,193],"class_list":["post-6228","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","tag-apache-hava-akimi","tag-apache-kafka-tr","tag-pyspark-tr","tag-veri-akisi","tag-veri-muhendisligi"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Streaming with Continuous Airflow - Bentego<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/\" \/>\n<meta property=\"og:locale\" content=\"tr_TR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Streaming with Continuous Airflow - Bentego\" \/>\n<meta property=\"og:description\" content=\"Streaming with Continuous Airflow Real-time data pipelines are increasingly crucial in today\u2019s data-driven systems. Traditional batch processing (e.g., nightly Spark jobs) introduces [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/\" \/>\n<meta property=\"og:site_name\" content=\"Bentego\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-25T08:17:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-10-20T16:09:12+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/bentego.com\/wp-content\/uploads\/2025\/06\/Frame-82__.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2400\" \/>\n\t<meta property=\"og:image:height\" content=\"1600\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Bentego\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Yazan:\" \/>\n\t<meta name=\"twitter:data1\" content=\"Bentego\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tahmini okuma s\u00fcresi\" \/>\n\t<meta name=\"twitter:data2\" content=\"17 dakika\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/\"},\"author\":{\"name\":\"Bentego\",\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/#\\\/schema\\\/person\\\/0348418b7b0cbca83fdd7a899d54821e\"},\"headline\":\"Streaming with Continuous Airflow\",\"datePublished\":\"2025-08-25T08:17:42+00:00\",\"dateModified\":\"2025-10-20T16:09:12+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/\"},\"wordCount\":3671,\"publisher\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/bentego.com\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/Frame-82__.png\",\"keywords\":[\"apache hava akimi\",\"apache kafka\",\"pyspark\",\"Veri Ak\u0131\u015f\u0131\",\"Veri M\u00fchendisli\u011fi\"],\"articleSection\":[\"Blog\"],\"inLanguage\":\"tr\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/\",\"url\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/\",\"name\":\"Streaming with Continuous Airflow - Bentego\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/bentego.com\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/Frame-82__.png\",\"datePublished\":\"2025-08-25T08:17:42+00:00\",\"dateModified\":\"2025-10-20T16:09:12+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/#breadcrumb\"},\"inLanguage\":\"tr\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"tr\",\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/#primaryimage\",\"url\":\"https:\\\/\\\/bentego.com\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/Frame-82__.png\",\"contentUrl\":\"https:\\\/\\\/bentego.com\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/Frame-82__.png\",\"width\":2400,\"height\":1600},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/streaming-with-continuous-airflow\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/bentego.com\\\/tr\\\/anasayfa\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Streaming with Continuous Airflow\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/#website\",\"url\":\"https:\\\/\\\/bentego.com\\\/tr\\\/\",\"name\":\"Bentego\",\"description\":\"Turning data into enterprise value\",\"publisher\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/bentego.com\\\/tr\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"tr\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/#organization\",\"name\":\"Bentego\",\"url\":\"https:\\\/\\\/bentego.com\\\/tr\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"tr\",\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/bentego.com\\\/wp-content\\\/uploads\\\/2025\\\/05\\\/logo-bentego.svg\",\"contentUrl\":\"https:\\\/\\\/bentego.com\\\/wp-content\\\/uploads\\\/2025\\\/05\\\/logo-bentego.svg\",\"width\":433,\"height\":109,\"caption\":\"Bentego\"},\"image\":{\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/bentego.com\\\/tr\\\/#\\\/schema\\\/person\\\/0348418b7b0cbca83fdd7a899d54821e\",\"name\":\"Bentego\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Streaming with Continuous Airflow - Bentego","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/","og_locale":"tr_TR","og_type":"article","og_title":"Streaming with Continuous Airflow - Bentego","og_description":"Streaming with Continuous Airflow Real-time data pipelines are increasingly crucial in today\u2019s data-driven systems. Traditional batch processing (e.g., nightly Spark jobs) introduces [&hellip;]","og_url":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/","og_site_name":"Bentego","article_published_time":"2025-08-25T08:17:42+00:00","article_modified_time":"2025-10-20T16:09:12+00:00","og_image":[{"width":2400,"height":1600,"url":"https:\/\/bentego.com\/wp-content\/uploads\/2025\/06\/Frame-82__.png","type":"image\/png"}],"author":"Bentego","twitter_card":"summary_large_image","twitter_misc":{"Yazan:":"Bentego","Tahmini okuma s\u00fcresi":"17 dakika"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/#article","isPartOf":{"@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/"},"author":{"name":"Bentego","@id":"https:\/\/bentego.com\/tr\/#\/schema\/person\/0348418b7b0cbca83fdd7a899d54821e"},"headline":"Streaming with Continuous Airflow","datePublished":"2025-08-25T08:17:42+00:00","dateModified":"2025-10-20T16:09:12+00:00","mainEntityOfPage":{"@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/"},"wordCount":3671,"publisher":{"@id":"https:\/\/bentego.com\/tr\/#organization"},"image":{"@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/#primaryimage"},"thumbnailUrl":"https:\/\/bentego.com\/wp-content\/uploads\/2025\/06\/Frame-82__.png","keywords":["apache hava akimi","apache kafka","pyspark","Veri Ak\u0131\u015f\u0131","Veri M\u00fchendisli\u011fi"],"articleSection":["Blog"],"inLanguage":"tr"},{"@type":"WebPage","@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/","url":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/","name":"Streaming with Continuous Airflow - Bentego","isPartOf":{"@id":"https:\/\/bentego.com\/tr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/#primaryimage"},"image":{"@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/#primaryimage"},"thumbnailUrl":"https:\/\/bentego.com\/wp-content\/uploads\/2025\/06\/Frame-82__.png","datePublished":"2025-08-25T08:17:42+00:00","dateModified":"2025-10-20T16:09:12+00:00","breadcrumb":{"@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/#breadcrumb"},"inLanguage":"tr","potentialAction":[{"@type":"ReadAction","target":["https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/"]}]},{"@type":"ImageObject","inLanguage":"tr","@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/#primaryimage","url":"https:\/\/bentego.com\/wp-content\/uploads\/2025\/06\/Frame-82__.png","contentUrl":"https:\/\/bentego.com\/wp-content\/uploads\/2025\/06\/Frame-82__.png","width":2400,"height":1600},{"@type":"BreadcrumbList","@id":"https:\/\/bentego.com\/tr\/streaming-with-continuous-airflow\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/bentego.com\/tr\/anasayfa\/"},{"@type":"ListItem","position":2,"name":"Streaming with Continuous Airflow"}]},{"@type":"WebSite","@id":"https:\/\/bentego.com\/tr\/#website","url":"https:\/\/bentego.com\/tr\/","name":"Bentego","description":"Turning data into enterprise value","publisher":{"@id":"https:\/\/bentego.com\/tr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/bentego.com\/tr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"tr"},{"@type":"Organization","@id":"https:\/\/bentego.com\/tr\/#organization","name":"Bentego","url":"https:\/\/bentego.com\/tr\/","logo":{"@type":"ImageObject","inLanguage":"tr","@id":"https:\/\/bentego.com\/tr\/#\/schema\/logo\/image\/","url":"https:\/\/bentego.com\/wp-content\/uploads\/2025\/05\/logo-bentego.svg","contentUrl":"https:\/\/bentego.com\/wp-content\/uploads\/2025\/05\/logo-bentego.svg","width":433,"height":109,"caption":"Bentego"},"image":{"@id":"https:\/\/bentego.com\/tr\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/bentego.com\/tr\/#\/schema\/person\/0348418b7b0cbca83fdd7a899d54821e","name":"Bentego"}]}},"_links":{"self":[{"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/posts\/6228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/comments?post=6228"}],"version-history":[{"count":5,"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/posts\/6228\/revisions"}],"predecessor-version":[{"id":6239,"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/posts\/6228\/revisions\/6239"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/media\/5922"}],"wp:attachment":[{"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/media?parent=6228"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/categories?post=6228"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bentego.com\/tr\/wp-json\/wp\/v2\/tags?post=6228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}