IoT Resources

DZone's Featured IoT Resources

Orange Pi Cluster With Docker Swarm and MariaDB

By Alejandro Duarte CORE

Building a cluster of single-board mini-computers is an excellent way to explore and learn about distributed computing. With the scarcity of Raspberry Pi boards, and the prices starting to get prohibitive for some projects, alternatives such as Orange Pi have gained popularity. In this article, I’ll show you how to build a (surprisingly cheap) 4-node cluster packed with 16 cores and 4GB RAM to deploy a MariaDB replicated topology that includes three database servers and a database proxy, all running on a Docker Swarm cluster and automated with Ansible. This article was inspired by a member of the audience who asked my opinion about Orange Pi during a talk I gave in Colombia. I hope this completes the answer I gave you. What Is a Cluster? A cluster is a group of computers that work together to achieve a common goal. In the context of distributed computing, a cluster typically refers to a group of computers that are connected to each other and work together to perform computation tasks. Building a cluster allows you to harness the power of multiple computers to solve problems that a single computer cannot handle. For example, a database can be replicated in multiple nodes to achieve high availability—if one node fails, other nodes can take over. It can also be used to implement read/write splitting to make one node handle writes, and another reads in order to achieve horizontal scalability. What Is Orange Pi Zero2? The Orange Pi Zero2 is a small single-board computer that runs on the ARM Cortex-A53 quad-core processor. It has 512MB or 1GB of DDR3 RAM, 100Mbps Ethernet, Wi-Fi, and Bluetooth connectivity. The Orange Pi Zero2 is an excellent choice for building a cluster due to its low cost, small size, and good performance. The only downside I found was that the Wi-Fi connection didn’t seem to perform as well as with other single-board computers. From time to time, the boards disconnect from the network, so I had to place them close to a Wi-Fi repeater. This could be a problem with my setup or with the boards. I’m not entirely sure. Having said that, this is not a production environment, so it worked pretty well for my purposes. What You Need Here are the ingredients: Orange Pi Zero2: I recommend the 1GB RAM variant and try to get at least 4 of them. I recently bought 4 of them for €30 each. Not bad at all! Give it a try! MicroSD cards: One per board. Try to use fast ones — it will make quite a difference in performance! I recommend at least 16GB. For reference, I used SanDisk Extreme Pro Micro/SDXC with 32GB, which offers a write speed of 90 MB/s and reads at 170 MB/s. A USB power hub: To power the devices, I recommend a dedicated USB power supply. You could also just use individual chargers, but the setup will be messier and require a power strip with as many outlets as devices as you have. It’s better to use a USB multi-port power supply. I used an Anker PowerPort 6, but there are also good and cheaper alternatives. You’ll have to Google this too. Check that each port can supply 5V and at least 2.4A. USB cables: Each board needs to be powered via a USB-C port. You need a cable with one end of type USB-C and the other of the type your power hub accepts. Bolts and nuts: To stack up the boards. Heat sinks (optional): These boards can get hot. I recommend getting heat sinks to help with heat dissipation. Materials needed for building an Orange Pi Zero2 cluster Assembling the Cluster One of the fun parts of building this cluster is the physical assembly of the boards on a case or some kind of structure that makes them look like a single manageable unit. Since my objective here is to keep the budget as low as possible, I used cheap bolts and nuts to stack the boards one on top of the other. I didn’t find any ready-to-use cluster cases for the Orange Pi Zero2. One alternative is to 3D-print your own case. When stacking the boards together, keep an eye on the antenna placement. Avoid crushing the cable, especially if you installed heat sinks. An assembled Orange Pi Zero2 cluster with 4 nodes Installing the Operating System The second step is to install the operating system on each microSD card. I used Armbian bullseye legacy 4.9.318. Download the file and use a tool like balenaEtcher to make bootable microSD cards. Download and install this tool on your computer. Select the Armbian image file and the drive that corresponds to the micro SD card. Flash the image and repeat the process for each micro SD card. Configuring Orange Pi WiFi Connection (Headless) To configure the Wi-Fi connection, Armbian includes the /boot/armbian_first_run.txt.template file which allows you to configure the operating system when it runs for the first time. The template includes instructions, so it’s worth checking. You have to rename this file to armbian_first_run.txt. Here’s what I used: Plain Text FR_general_delete_this_file_after_completion=1 FR_net_change_defaults=1 FR_net_ethernet_enabled=0 FR_net_wifi_enabled=1 FR_net_wifi_ssid='my_connection_id>' FR_net_wifi_key='my_password' FR_net_wifi_countrycode='FI' FR_net_use_static=1 FR_net_static_gateway='192.168.1.1' FR_net_static_mask='255.255.255.0' FR_net_static_dns='192.168.1.1 8.8.8.8' FR_net_static_ip='192.168.1.181' Use your own Wi-Fi details, including connection name, password, country code, gateway, mask, and DNS. I wasn’t able to read the SD card from macOS. I had to use another laptop with Linux on it to make the changes to the configuration file on each SD card. To mount the SD card on Linux, run the following command before and after inserting the SD card and see what changes: Shell sudo fdisk -l I created a Bash script to automate the process. The script accepts as a parameter the IP to set. For example: Shell sudo ./armbian-setup.sh 192.168.1.181 I run this command on each of the four SD cards changing the IP address from 192.168.1.181 to 192.168.1.184. Connecting Through SSH Insert the flashed and configured micro SD cards on each board and turn the power supply on. Be patient! Give the small devices time to boot. It can take several minutes the first time you boot them. An Orange Pi cluster running Armbian Use the ping command to check whether the devices are ready and connected to the network: Shell ping 192.168.1.181 Once they respond, connect to the mini-computers through SSH using the root user and the IP address that you configured. For example: Shell ssh root@192.168.1.181 The default password is: Plain Text 1234 You’ll be presented with a wizard-like tool to complete the installation. Follow the steps to finish the configuration and repeat the process for each board. Installing Ansible Imagine you want to update the operating system on each machine. You’ll have to log into a machine and run the update command and end the remote session. Then repeat for each machine in the cluster. A tedious job even if you have only 4 nodes. Ansible is an automation tool that allows you to run a command on multiple machines using a single call. You can also create a playbook, a file that contains commands to be executed in a set of machines defined in an inventory. Install Ansible on your working computer and generate a configuration file: Shell sudo su ansible-config init --disabled -t all > /etc/ansible/ansible.cfg exit In the /etc/ansible/ansible.cfg file, set the following properties (enable them by removing the semicolon): Plain Text host_key_checking=False become_allow_same_user=True ask_pass=True This will make the whole process easier. Never do this in a production environment! You also need an inventory. Edit the /etc/ansible/hosts file and add the Orange Pi nodes as follows: Plain Text ############################################################################## # 4-node Orange Pi Zero 2 cluster ############################################################################## [opiesz] 192.168.1.181 ansible_user=orangepi hostname=opiz01 192.168.1.182 ansible_user=orangepi hostname=opiz02 192.168.1.183 ansible_user=orangepi hostname=opiz03 192.168.1.184 ansible_user=orangepi hostname=opiz04 [opiesz_manager] opiz01.local ansible_user=orangepi [opiesz_workers] opiz[02:04].local ansible_user=orangepi In the ansible_user variable, specify the username that you created during the installation of Armbian. Also, change the IP addresses if you used something different. Setting up a Cluster With Ansible Playbooks A key feature of a computer cluster is that the nodes should be somehow logically interconnected. Docker Swarm is a container orchestration tool that will convert your arrangement of Orange Pi devices into a real cluster. You can later deploy any kind of server software. Docker Swarm will automatically pick one of the machines to host the software. To make the process easier, I have created a set of Ansible playbooks to further configure the boards, update the packages, reboot or power off the machines, install Docker, set up Docker Swarm, and even install a MariaDB database with replication and a database cluster. Clone or download this GitHub repository: Shell git clone https://github.com/alejandro-du/orange-pi-zero-cluster-ansible-playbooks.git Let’s start by upgrading the Linux packages on all the boards: Shell ansible-playbook upgrade.yml --ask-become-pass Now configure the nodes to have an easy-to-remember hostname with the help of Avahi, and configure the LED activity (red LED activates on SD card activity): Shell ansible-playbook configure-hosts.yml --ask-become-pass Reboot all the boards: Shell ansible-playbook reboot.yml --ask-become-pass Install Docker: Shell ansible-playbook docker.yml --ask-become-pass Set up Docker Swarm: Shell ansible-playbook docker-swarm.yml --ask-become-pass Done! You have an Orange Pi cluster ready for fun! Deploying MariaDB on Docker Swarm I have to warn you here. I don’t recommend running a database on container orchestration software. That’s Docker Swarm, Kubernetes, and others. Unless you are willing to put a lot of effort into it. This article is a lab. A learning exercise. Don’t do this in production! Now let’s get back to the fun… Run the following to deploy one MariaDB primary server, two MariaDB replica servers, and one MaxScale proxy: Shell ansible-playbook mariadb-stack.yml --ask-become-pass The first time you do this, it will take some time. Be patient. SSH into the manager node: Shell ssh orangepi@opiz01.local Inspect the nodes in the Docker Swarm cluster: Shell docker node ls Inspect the MariaDB stack: Shell docker stack ps mariadb A cooler way to inspect the containers in the cluster is by using the Docker Swarm Visualizer. Deploy it as follows: Shell docker service create --name=viz --publish=9000:8080 --constraint=node.role==manager --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock alexellis2/visualizer-arm:latest On your working computer, open a web browser and go to this URL. You should see all the nodes in the cluster and the deployed containers. Docker Swarm Visualizer showing MariaDB deployed MaxScale is an intelligent database proxy with tons of features. For now, let’s see how to connect to the MariaDB cluster through this proxy. Use a tool like DBeaver, DbGate, or even a database extension for your favorite IDE. Create a new database connection using the following connection details: Host: opiz01.local Port: 4000 Username: user Password: password Create a new table: MariaDB SQL USE demo; CREATE TABLE messages( id INT PRIMARY KEY AUTO_INCREMENT, content TEXT NOT NULL ); Insert some data: MariaDB SQL INSERT INTO messages(content) VALUES ("It works!"), ("Hello, MariaDB"), ("Hello, Orange Pi"); When you execute this command, MaxScale sends it to the primary server. Now read the data: MariaDB SQL SELECT * FROM messages; When you execute this command, MaxScale sends it to one of the replicas. This division of reads and writes is called read-write splitting. The MaxScale UI showing a MariaDB cluster with replication and read-write splitting You can also access the MaxScale UI. Use the following credentials: Username: admin Password: mariadb Watch the following video if you want to learn more about MaxScale and its features. You won’t regret it! More

How to Stream Sensor Data to Apache Pinot for Real Time Analysis

By David G. Simmons CORE

This is part 2 of a multi-part sample application that I've been building using Kafka and Apache Pinot. In the first part, we built an NFC Badge Reader to record when badges are swiped. In this part, we will begin ingesting some environmental data from a sensor into the mix. I'm not going to spoil the fun by telling you where this is ultimately headed, but feel free to reach out to me on Twitter and see if you can guess! As always, all of the code is available on GitHub. Hardware Hardware is part of this demo because I like building hardware, and it’s always more fun to have something that you can actually interact with. The hardware we’ll be building is a CO2 sensor that reads the CO2 concentration in the air, the temperature, and the relative humidity, and sends it to a Kafka topic. The sensor is based on the SCD-30 sensor, which is a non-dispersive infrared (NDIR) sensor that measures the CO2 concentration in the air. It’s a bit expensive, but it is also an industrial-quality sensor that can be relied on to give accurate readings. You will see a lot of projects based on cheaper “eCO2” sensors, but they are not as accurate as the SCD-30. I have done extensive testing of a variety of CO2 sensors and the eCO2 sensors can be off by as much as 200%, which is not something you want to rely on if accuracy matters. The sensor is connected to a SparkFun ESP32 Thing Plus - Qwiic, which reads the sensor data and sends it to a Kafka topic. I used these 2 components specifically because they both implement the Qwiic connector, which makes it easy to connect them together. The Qwiic connector is a standard 4-pin connector that uses I2C and is used by a number of different sensors and boards. It eliminates the need for soldering since you can just plug the sensor into the board and start coding. The ESP-32 Thing Plus is what’s referred to as a "Feather" because of the form factor. Really, you can use any ESP-32 board for this but the Feather is a great board for this project because it has built-in WiFi and Bluetooth, which makes it easy to connect to the Internet. It also has a built-in battery charger, which makes it easy to power the sensor with a battery if you want to place it somewhere out of reach. I programmed it using the Arduino IDE, which is a great tool for prototyping. C++ #include <ArduinoJson.h> // Serialize and deserialize JSON #include "EspMQTTClient.h" // handle MQTT #include <Wire.h> // 2Wire protocol #include "SparkFun_SCD30_Arduino_Library.h" // The CO2 Sensor EspMQTTClient client( "SSID", // Your SSID "PASSWD", // Your WiFi Password "mqtt-broker", // MQTT Broker server ip or hostname "PinotClient", // Client name that uniquely identify your device 8883 // The MQTT port, default to 1883. this line can be omitted ); SCD30 airSensor; // our sensor void setup() { Serial.begin(115200); // Optional functionalities of EspMQTTClient client.enableDebuggingMessages(); // Enable debugging messages sent to serial output Wire.begin(); if (airSensor.begin() == false) { Serial.println("Air sensor not detected. Please check wiring. Freezing..."); while (1) // infinite loop of nothingness ; } } // This function is called once everything is connected (Wifi and MQTT) // WARNING : YOU MUST IMPLEMENT IT IF YOU USE EspMQTTClient void onConnectionEstablished() { while (true) { // do this forever while (airSensor.dataAvailable()) { int co2 = airSensor.getCO2(); float temp = airSensor.getTemperature(); float hum = airSensor.getHumidity(); StaticJsonDocument<96> doc; doc["sensor"] = "SCD-30"; doc["co2"] = co2; doc["temp"] = temp; doc["humid"] = hum; char buff[128]; serializeJson(doc, buff); serializeJson(doc, Serial); // this is for debugging Serial.println(); client.publish("co2", buff); // publish the data to the co2 topic } } } void loop() { client.loop(); } That’s the entire Arduino sketch. It connects to WiFi, connects to an MQTT broker, and then reads the sensor data and publishes it to the co2 topic. You can find the code on GitHub. The SCD-30 sensor can really only provide about one reading/second, so there’s no need to do anything fancy to make it faster. Getting the Readings Into Kafka As it turns out there isn’t a good way to get data straight from an Arduino device into Kafka which is why we sent it to the MQTT broker above. Now that it’s in the MQTT broker we have to get it out and feed it into our Kafka topic. One of the other drawbacks of using an Arduino device is that they are not very good at keeping time. It is possible to use Network Time Protocol (NTP) to keep the time, but it’s not very reliable. To get around these two problems, I wrote a small program in Go that reads the data from the MQTT broker, gives it an accurate timestamp, and then publishes it to a Kafka topic. To make things easier, we will reuse some code from our Badge Reader project. Go package main import ( "bufio" "crypto/tls" "encoding/json" "fmt" "os" "strings" "time" "github.com/confluentinc/confluent-kafka-go/kafka" MQTT "github.com/eclipse/paho.mqtt.golang" ) /* Struct to hold the data from the MQTT message */ type env struct { Sensor string `json:"sensor"` CO2 int32 `json:"co2"` Temp float32 `json:"temp"` Humid float32 `json:"humid"` Time int32 `json:"time"` } That struct should look familiar as it’s almost identical to the one we used in the Arduino code. The only difference is the Time field, which we will use to store the timestamp. We will also re-use all the code from the Badge Reader project to connect to Kafka and publish the data to the topic, so I won’t reproduce it here. Go /* Read MQTT messages from the specified broker and topic and send them to the kafka broker */ func ReadMQTTMessages(brokerUri string, username string, password string, topic string, qos byte) error { // Create an MQTT client options object opts := MQTT.NewClientOptions() // Set the broker URI, username, and password opts.AddBroker(brokerUri) opts.SetUsername(username) opts.SetPassword(password) opts.SetTLSConfig(&tls.Config{InsecureSkipVerify: true}) // Create an MQTT client client := MQTT.NewClient(opts) if token := client.Connect(); token.Wait() && token.Error() != nil { return token.Error() } // Subscribe to the specified topic with the specified QoS level if token := client.Subscribe(topic, qos, func(client MQTT.Client, message MQTT.Message) { envData := env{} json.Unmarshal(message.Payload(), &envData) envData.Time = int32(time.Now().UnixMilli()) mess, _ := json.Marshal(envData) fmt.Printf("Received message: %s\n", mess) sendToKafka("co_2", string(mess)) }); token.Wait() && token.Error() != nil { return token.Error() } // Wait for messages to arrive for { select {} } } func main() { err := ReadMQTTMessages("tcp://broker-address:8883", "", "", "co2", 0) if err != nil { fmt.Printf("Error reading MQTT messages: %v\n", err) } } The main() function calls the ReadMQTTMessages() function which connects to the MQTT broker and subscribes to the co2 topic. When a message is received it is parsed into the env struct, the timestamp is added, and then it is published to the Kafka topic co_2. Consuming the Data Now that we have the data in Kafka, we can begin consuming it into StarTree Cloud. Since this project is building on the previous Badge Reader project, I’m going to reuse that StarTree Cloud instance and add a new table to it. Since I’ve already been sending data to the Kafka topic, once I add the source (with the right credentials) StarTree Cloud should show me some data that it has read from the topic. After we click through to the table we can see the data that has been read from the Kafka topic. From there, we can go to the Pinot Query Editor and start writing queries. Conclusion In this post, we looked at how to use an Arduino device to read data from a sensor and send it to an MQTT broker. We then used a Go program to read the data from the MQTT broker, add a timestamp, and send it to a Kafka topic. Finally, we used StarTree Cloud to read the data from the Kafka topic and make it available for querying in Apache Pinot. More

User Data Governance and Processing Using Serverless Streaming

By Maharshi Jha

Unlocking the Potential of IoT Applications With Real-Time Alerting Using Apache Kafka Data Streams and KSQL

By Kiran Peddireddy

Shaping the Future of IoT: 7 MQTT Technology Trends in 2023

By Zaiming (stone) Shi

Handling Bad Messages via DLQ by Configuring JDBC Kafka Sink Connector

Any trustworthy data streaming pipeline needs to be able to identify and handle faults. Exceptionally while IoT devices ingest endlessly critical data/events into permanent persistence storage like RDBMS for future analysis via multi-node Apache Kafka cluster. (Please click here to read how to set up a multi-node Apache Kafka Cluster). There could be scenarios where IoT devices might send fault/bad events due to various reasons at the source points, and henceforth appropriate actions can be executed to correct it further. The Apache Kafka architecture does not include any filtering and error handling mechanism within the broker so that maximum performance/scale can be yielded. Instead, it is included in Kafka Connect, which is an integration framework of Apache Kafka. As a default behavior, if a problem arises as a result of consuming an invalid message, the Kafka Connect task terminates, and the same applies to JDBC Sink Connector. Kafka Connect has been classified into two categories, namely Source (to ingest data from various data generation sources and transport to the topic) and Sink (to consume data/messages from the topic and send them eventually to various destinations). Without implementing a strict filtering mechanism or exception handling, we can ingest/publishes messages inclusive of wrong formatted to the Kafka topic because the Kafka topic accepts all messages or records as byte arrays in key-value pairs. But by default, the Kafka Connect task stops if an error occurs because of consuming an invalid message, and on top of that JDBC sink connector additionally won’t work if there is an ambiguity in the message schema. The biggest difficulty with the JDBC sink connector is that it requires knowledge of the schema of data that has already landed on the Kafka topic. Schema Registry must therefore be integrated as a separate component with the exiting Kafka cluster to transfer the data into the RDBMS. Therefore, to sink data from the Kafka topic to the RDBMS, the producers must publish messages/data containing the schema. You could read here to learn streaming Data via Kafka JDBC Sink Connector without leveraging Schema Registry from the Kafka topic. Since Apache Kafka 2.0, Kafka Connect has integrated error management features, such as the ability to reroute messages to a dead letter queue. In the Kafka cluster, a dead letter queue (DLQ) is a straightforward topic that serves as the destination for messages that, for some reason, were unable to reach their intended recipients, especially for JDBC sink connector, tables in RDBMS There might be two major reasons why the JDBC Kafka sink connector stops working abruptly while consuming messages from the topic: Ambiguity between data types and the actual payload Junk data in payload or wrong schema There is no complicacy of DLQ configuration in the JDBC sink connector. The following parameters need to be added in the sink configuration file (.properties file): errors.tolerance=allerrors.deadletterqueue.topic.name= <<Name of the DLQ Toic>>errors.deadletterqueue.topic.replication.factor= <<No of replication>>Note:- No of replication should be equal or less then the number of Kafka broker in the cluster. The DLQ topic would be created automatically with the above-mentioned replication factor when we start the JDBC sink connector for the first time. When an error occurs, or bad data is encountered by the JDBC sink connector while consuming messages from the topic, these unprocessed messages/bad data would be forwarded straightly to the DLQ. Subsequently, correct messages or data will send to the respective RDBMS tables continuously and again in between. If bad messages are encountered, then the same would be forwarded to the DLQ and so on. After landing the bad or erroneous messages on the DLQ, we will have two options either manually introspect each message to understand the root cause of the error or implement a mechanism to reprocess the bad messages and push them eventually to the consumers for JDBC sink connector the destination should be RDBMS tables. Dead letter queues are not enabled by default in Kafka Connect due to the above reason. Even though Kafka Connect supports several error-handling strategies, such as dead letter queues, silently ignoring, and failing quickly, the adoption of DLQ would be the best approach while configuring the JDBC sink connector. Decoupling completely the bad/error messages handling from the normal messages/data transportation from the Kafka topic would boost the overall efficiency of the entire system as well as allow the development team to develop an independent error handling mechanism from easy maintainability perspectives. Hope you have enjoyed this read. Please like and share if you feel this composition is valuable.

By Gautam Goswami CORE

Securing MQTT With Username and Password Authentication

Authentication is the process of identifying a user and verifying that they have access to a system or server. It is a security measure that protects the system from unauthorized access and guarantees that only valid users are using the system. Given the expansive nature of the IoT industry, it is crucial to verify the identity of those seeking access to its infrastructure. Unauthorized entry poses significant security threats and must be prevented. And that's why IoT developers should possess a comprehensive understanding of the various authentication methods. Today, I'll explain how authentication works in MQTT, what security risks it solves, and introduce the first authentication method: password-based authentication. What Is Authentication in MQTT? Authentication in MQTT refers to the process of verifying the identity of a client or a broker before allowing them to establish a connection or interact with the MQTT network. It is only about the right to connect to the broker and is separate from authorization, which determines which topics a client is allowed to publish and subscribe to. The authorization will be discussed in a separate article in this series. The MQTT broker can authenticate clients mainly in the following ways: Password-based authentication: The broker verifies that the client has the correct connecting credentials: username, client ID, and password. The broker can verify either the username or client ID against the password. Enhanced authentication (SCRAM): This authenticates the clients using a back-and-forth challenge-based mechanism known as Salted Challenge Response Authentication Mechanism. Other methods include Token Based Authentication like JWT, and also HTTP hooks, and more. In this article, we will focus on password-based authentication. Password-Based Authentication Password-based authentication aims to determine if the connecting party is legitimate by verifying that he has the correct password credentials. In MQTT, password-based authentication generally refers to using a username and password to authenticate clients, which is also recommended. However, in some scenarios, some clients may not carry a username, so the client ID can also be used as a unique identifier to represent the identity. When an MQTT client connects to the broker, it sends its username and password in the CONNECT packet. The example below shows a Wireshark capture of the CONNECT packet for a client with the corresponding values of client1, user, and MySecretPassword. After the broker gets the username (or client ID) and password from the CONNECT packet, it needs to look up the previously stored credentials in the corresponding database according to the username, and then compare it with the password provided by the client. If the username is not found in the database, or the password does not match the credentials in the database, the broker will reject the client's connection request. This diagram shows a broker using PostgreSQL to authenticate the client's username and password. The password-based authentication solves one security risk. Clients that do not hold the correct credentials (Username and Password) will not be able to connect to the broker. However, as you can see in the Wireshark capture, a hacker who has access to the communication channel can easily sniff the packets and see the connect credentials because everything is in plaintext. We will see in a later article in this series how we can solve this problem using TLS (Transport Layer Security). Secure Your Passwords With Salt and Hash Storing passwords in plaintext is not considered secure practice because it leaves passwords vulnerable to attacks. If an attacker gains access to a password database or file, they can easily read and use the passwords to gain unauthorized access to the system. To prevent this from happening, passwords should instead be stored in a hashed and salted format. What is a hash? It is a function that takes some input data, applies a mathematical algorithm to the data, and then generates an output that looks like complete nonsense. The idea is to obfuscate the original input data and also the function should be one-way. That means that there is no way to calculate the input given the output. However, hashes by themselves are not secure and can be vulnerable to dictionary attacks as shown in the following example. Consider this sha256 hash: 8f0e2f76e22b43e2855189877e7dc1e1e7d98c226c95db247cd1d547928334a9 It looks secure; you cannot tell what the password is by looking at it. However, the problem is that for a given password, the hash always produces the same result. So, it is easy to create a database of common passwords and their hash values. Here is an example: A hacker could look up this hash in an online hash database and learn that the password is passw0rd. "Salting" a password solves this problem. A salt is a random string of characters that is added to the password before hashing. This makes each password hash unique, even if the passwords themselves are the same. The salt value is stored alongside the hashed password in the database. When a user logs in, the salt is added to their password, and the resulting hash is compared to the hash stored in the database. If the hashes match, the user is granted access. Suppose that we add a random string of text to the password before we perform the hash function. The random string is called the salt value. For example with a salt value of az34ty1, sha256(passw0rdaz34ty1) is 6be5b74fa9a7bd0c496867919f3bb93406e21b0f7e1dab64c038769ef578419d This is unlikely to be in a hash database since this would require a large number of database hash entries just for the single plaintext passw0rd value. Best Practices for Password-Based Authentication in MQTT Here are some key takeaways from what we’ve mentioned in this article, which can be the best practices for password-based authentication in MQTT: One of the most important aspects of password-based authentication in MQTT is choosing strong and unique passwords. Passwords that are easily guessable or reused across multiple accounts can compromise the security of the entire MQTT network. It is also crucial to securely store and transmit passwords to prevent them from falling into the wrong hands. For instance, passwords should be hashed and salted before storage, and transmitted over secure channels like TLS. In addition, it's a good practice to limit password exposure by avoiding hard-coding passwords in code or configuration files, and instead using environment variables or other secure storage mechanisms. Summary In conclusion, password-based authentication plays a critical role in securing MQTT connections and protecting the integrity of IoT systems. By following best practices for password selection, storage, and transmission, and being aware of common issues like brute-force attacks, IoT developers can help ensure the security of their MQTT networks. However, it's important to note that password-based authentication is just one of many authentication methods available in MQTT, and may not always be the best fit for every use case. For instance, more advanced methods like digital certificates or OAuth 2.0 may provide stronger security in certain scenarios. Therefore, it's important for IoT developers to stay up-to-date with the latest authentication methods and choose the one that best meets the needs of their particular application. Next, I'll introduce another authentication method: SCRAM. Stay tuned for it!

By Kary Ware

Real-Time Analytics for IoT

We’ve been hearing that the Internet of Things (IoT) would transform the way we live and work by connecting everyday devices to the internet for a long time now. While much of the promise of the IoT always seems to be "coming soon," the proliferation of IoT devices has already created a massive amount of data that needs to be processed, stored, and analyzed, in real-time. I’ve said for years—actually over a decade now—that if your IoT data isn’t timely, accurate, and actionable, you’re mostly wasting your time in collecting it. This is where the Apache Pinot® database comes in. Pinot is an open-source, distributed data store designed for real-time analytics. The high scalability, reliability, and low latency query response times of Pinot make it a great solution for processing massive amounts of IoT data. In this post, we will explore the benefits of using Pinot in IoT applications. IoT devices generate a massive amount of data, and traditional databases are not equipped to handle the scale and complexity. I’ve used a lot of solutions to collect, store and analyze IoT data, but Pinot is specifically designed for handling high-velocity data streams in real-time. With Pinot, IoT data can be ingested, processed, and analyzed in real-time. In addition to real-time processing, Pinot offers scalability and reliability. As the number of IoT devices and the amount of data they generate continues to grow, it becomes critical to have a system that can scale horizontally to handle the increasing load. Pinot can scale easily by adding more nodes to the cluster, and it also provides fault tolerance, ensuring that data is not lost in the event of a node failure. Some Background What Is IoT? If we’re going to talk about IoT and Pinot, it’s probably best to give at least a bit of context on what IoT actually is and is not. IoT, short for the Internet of Things, refers to a network of physical devices, vehicles, home appliances, and other items embedded with sensors, software, and network connectivity. These devices can communicate with each other and share data over the internet. The range of IoT devices is diverse, ranging from smartwatches and fitness trackers to smart home devices like thermostats and security cameras to industrial machines and city infrastructure. The IoT market is expected to grow rapidly in the coming years, with estimates suggesting that there will be over 27 billion IoT devices by 2025. The significance of IoT lies in the ability to collect and analyze data from a wide range of sources in real-time. This data can be used to gain insights, optimize processes, improve decision-making, and enhance user experiences. For example, in the healthcare industry, IoT devices can monitor vital signs and other health metrics, alerting doctors or caregivers in case of abnormal readings. In the retail industry, IoT sensors can track inventory levels and customer behavior, enabling retailers to optimize store layouts and product offerings. Some retail establishments are already using IoT devices to handle increases or decreases in customer traffic in stores. In the transportation industry, IoT devices can monitor vehicle performance and location, enabling fleet managers to improve efficiency and safety. Most modern cars are already equipped with IoT devices that can monitor and report on a wide range of vehicle metrics, including fuel consumption, tire pressure, and engine performance, and almost all over-the-road trucks are already reporting vast amounts of telemetry data to their fleet managers. What Is Apache Pinot? Pinot is an open-source distributed data store that is purpose-built for real-time analytics. Originally developed at LinkedIn, Pinot has since become an Apache Software Foundation project and is used by a growing number of companies and organizations for a variety of use cases. Pinot is designed to handle large volumes of data in real-time and provides sub-second query latencies, making it ideal for use cases that require real-time analytics, such as IoT. One of the key features of Pinot is its distributed architecture. Pinot is designed to be horizontally scalable, which means that it can handle increasing amounts of data by adding more nodes to the cluster. This distributed architecture also provides fault tolerance, which means that it can continue to function even if one or more nodes in the cluster fail. Pinot stores data in columnar format, which allows for highly efficient querying and analysis. By storing data in columns rather than rows, Pinot can quickly scan through large amounts of data and provide compute aggregations or other complex calculations required for IoT data analysis. Pinot provides support for a variety of data types, including numerical, text, JSON, and geospatial data. It allows for nested queries, which can be useful for analyzing complex IoT data sets, and an emerging feature of generalized joins will make these query options even more powerful. Overall, Pinot is a powerful tool for analyzing and managing IoT data in real time. Advantages of Using Apache Pinot With IoT When it comes to using Pinot with IoT, there are a number of use cases and scenarios where the two technologies can be effectively combined. For example, in the industrial IoT space, Pinot can be used to analyze sensor data from manufacturing equipment to optimize performance and improve efficiency. Analyzing data from industrial equipment in real-time allows for much better predictive maintenance, more efficient usage patterns, and overall better utilization of resources. If you’re going to use Pinot with IoT, the first step is to identify the data sources that will be ingested into Pinot. In reality, you’ll want to back up even further and analyze the types of insights and efficiencies you’re looking for in your deployment. Once you’ve done this, you can begin to design the kind of data you’ll want to collect in order to facilitate those insights. This can include data from sensors, gateways, and other IoT devices. Once the data sources have been identified, Pinot can be configured to ingest the data in real time, processing and analyzing it as it is received. Once you’ve begun to ingest your data into Pinot, you can query it using SQL. With your queries in place, you can start identifying patterns in sensor data that can help detect anomalies in equipment performance and track changes in environmental conditions over time. However, using Apache Pinot with IoT naturally presents data security and privacy challenges. IoT devices are often connected to sensitive systems or contain personal data, making it important to ensure that data is properly secured and protected. Organizations need to implement robust security measures to protect against unauthorized access and data breaches. Another challenge of using Pinot with IoT is the complexity of the data sets involved. IoT data can be highly complex and heterogeneous, consisting of a variety of data types and formats. This can make it difficult to analyze and extract insights from the data. Organizations need to have a clear understanding of the data they are working with and develop effective data management and analysis strategies to overcome these challenges. Despite these challenges, the benefits of using Pinot with IoT make it a powerful tool for organizations looking to leverage their IoT data. With its real-time analytics capabilities, distributed architecture, and support for complex queries, Pinot is well-suited for managing and analyzing the vast amounts of data generated by IoT devices. By implementing effective data management and security strategies, organizations can unlock the full potential of their IoT data and drive innovation and growth in their respective industries. Use Cases of Apache Pinot With IoT There are various use cases of Pinot with IoT, ranging from predictive maintenance in manufacturing to healthcare monitoring and analysis. Below are some detailed examples of how Pinot can be used in different IoT applications: Predictive maintenance in manufacturing: One of the most promising applications of Pinot in IoT is predictive maintenance in manufacturing. By collecting and analyzing real-time data from sensors and machines, Pinot can help predict when a machine is likely to fail and schedule maintenance before a breakdown occurs. This can improve equipment uptime and reduce maintenance costs. Smart city monitoring and management: Smart city applications are a rapidly expanding use case for IoT. Smart city data from sensors and devices are used to manage various aspects of city infrastructure such as traffic, parking, and waste management. Pinot can help analyze real-time data from multiple sources and provide insights that can be used to optimize city operations and improve citizen services. Real-time tracking and monitoring of vehicles: Another use case of Pinot in IoT is the monitoring and management of fleet vehicles. Pinot can be used to collect and analyze data from GPS trackers, vehicle sensors, and cameras to provide real-time insights into vehicle location, speed, and driving behavior. Combined with Smart City data such as real-time traffic insights, fleet managers can optimize routes, reroute deliveries, and optimize for external factors in real-time. This can help optimize fleet management and improve driver safety. Healthcare monitoring and analysis: Healthcare applications, where data from wearables, sensors, and medical devices can be used to monitor patients and analyze health outcomes in order to improve patient care and reduce errors. Conclusion I hope I have shown you how Pinot can provide you with a powerful toolset for managing and analyzing IoT data in real time. Its distributed architecture and fault-tolerant design make it an ideal choice for organizations looking to scale their data storage and processing capabilities as their IoT data grows. With its support for complex queries and SQL-like query language, Pinot offers a flexible and powerful platform for analyzing complex IoT data sets. As the IoT continues to grow and evolve, Pinot is well-positioned to become an increasingly important tool for managing and analyzing IoT data in real time. By embracing this technology and developing effective strategies for managing and analyzing IoT data, organizations can stay ahead of the curve and unlock new opportunities for growth and innovation. Try It Out Yourself Interested in seeing if Apache Pinot is a possible solution for you? Come join the community of users who are implementing Apache Pinot for real-time data analytics. Want to learn even more about it? Then be sure to attend the Real Time Analytics Summit in San Francisco!

By David G. Simmons CORE

Get Up to Speed With the Latest Cybersecurity Standard for Consumer IoT

With growing concern regarding data privacy and data safety today, Internet of Things (IoT) manufacturers have to up their game if they want to maintain consumer trust. This is the shared goal of the latest cybersecurity standard from the European Telecommunications Standards Institute (ETSI). Known as ETSI EN 303 645, the standard for consumer devices seeks to ensure data safety and achieve widespread manufacturer compliance. So, let’s dive deeper into this standard as more devices enter the home and workplace. The ETSI Standard and Its Protections It counts a long name but heralds an important era of device protection. ETSI EN 303 645 is a standard and method by which a certifying authority can evaluate IoT device security. Developed as an internationally applicable standard, ETSI offers manufacturers a baseline for security rather than a comprehensive set of precise guidelines. The standard may also lay the groundwork for various future IoT cybersecurity certifications in different regions around the world. For example, look at what’s happening in the European Union. Last September, the European Commission introduced a proposed Cyber Resilience Act, intended to protect consumers and businesses from products with inadequate security features. If passed, the legislation — a world-first on connected devices — will bring mandatory cybersecurity requirements for products with digital elements throughout their whole lifecycle. The prohibition of default and weak passwords, guaranteed support of software updates and mandatory testing for security vulnerabilities are just some of the proposals. Interestingly, these same rules are included in the ETSI standard. IoT Needs a Cybersecurity Standard Shockingly, a single home filled with smart devices could experience as many as 12,000 cyber attacks in a single week. While most of those cyber attacks will fail, the sheer number means some inevitably get through. The ETSI standard strives to keep those attacks out with basic security measures, many of which should already be common sense, but unfortunately aren’t always in place today. For example, one of the basic requirements of the ETSI standard is no universal default passwords. In other words, your fitness tracker shouldn’t have the same default password as every other fitness tracker of that brand on the market. Your smart security camera shouldn’t have a default password that anyone who owns a similar camera could exploit. It seems like that would be common sense for IoT manufacturers, but there have been plenty of breaches that occurred simply because individuals didn’t know to change the default passwords on their devices. Another basic requirement of ETSI is allowing individuals to delete their own data. In other words, the user has control over the data a company stores about them. Again, this is pretty standard stuff in the privacy world, particularly in light of regulations like Europe’s General Data Protection Regulation (GDPR) and California’s Consumer Privacy Act (CCPA). However, this is not yet a universal requirement for IoT devices. Considering how much health- and fitness-related data many of these devices collect, consumer data privacy needs to be more of a priority. Several more rules in ETSI have to do with the software installed on such devices and how the provider manages security for the software. For example, there needs to be a system for reporting vulnerabilities. The provider needs to keep the software up to date and ensure software integrity. We would naturally expect these kinds of security measures for nearly any software we use, so the standard is basically just a minimum for data protection in IoT. Importantly, the ETSI standard covers pretty much everything that could be considered a smart device, including wearables, smart TVs and cameras, smart home assistants, smart appliances, and more. The standard also applies to connected gateways, hubs, and base stations. In other words, it covers the centralized access point for all of the various devices. Why Device Creators Should Implement the Standard Today Just how important is the security standard? Many companies are losing customers today due to a lack of consumer trust. There are so many stories of big companies like Google and Amazon failing to adequately protect user data, and IoT in particular has been in the crosshairs multiple times due to privacy concerns. An IoT manufacturer that doesn’t want to lose business, face fines and lawsuits, and damage the company's reputation should consider implementing the ETSI standard as a matter of course. After all, these days a given home might have as many as 16 connected devices, each an entry point into the home network. A company might have one laptop per employee but two, three, or more other smart devices per employee. And again, each smart device is a point of entry for malicious hackers. Without a comprehensive cybersecurity standard like ETSI EN 303 645, people who own unprotected IoT devices need to worry about identity theft, ransomware attacks, data loss and much more. How to Test and Certify Based on ETSI Certification is fairly basic and occurs in five steps: Manufacturers have to understand the 33 requirements and 35 recommendations of the ETSI standard and design devices accordingly. Manufacturers also have to buy an IoT platform that has been built with the ETSI standard in mind, since the standard will fundamentally influence the way the devices are produced and how they operate within the platform. Next, any IoT manufacturer trying to meet the ETSI standard has to fill out documents that provide information for device evaluation. The first document is the Implementation Conformance Statement, which shows which requirements and recommendations the IoT device does or doesn’t meet. The second is the Implementation eXtra Information for Testing, which provides design details for testing. A testing provider will next evaluate and test the product based on the two documents and give a report. The testing provider will provide a seal or other indication that the product is ETSI EN 303 645-compliant. With new regulations on the horizon, device manufacturers and developers should see it as best practice to get up to speed with this standard. Better cybersecurity is not only important for consumer protection but brand reputation. Moreover, this standard can provide a basis for stricter device security certifications and measures in the future. Prepare today for tomorrow.

By Carsten Rhod Gregersen

Your Pi-Hole Is a Rich Source of Data

While a lot of my inspiration for blog posts come from talking with New Relic users, it's hard to share them as examples because their so specific and often confidential. So I find myself struggling more to find a generic "for instance" that's easy to understand and accessible to all everyone. Which should explain why I use my home environment as the sample use case so often. Even if you don't have exactly the same gear or setup I do, it's likely you have something analogous. On top of that, if you don't have the specific element I'm discussing, many times I believe it's something you ought to consider. That brings us to my example today: Pi-Hole. Pi-Hole acts as a first-level DNS server for your network. But what it REALLY does is make your network faster and safer by blocking requests to malicious, unsavory, or just plain obnoxious sites. If you’re using Pi-Hole, it’ll be most noticeable in the ways advertisements on a webpage load. BEFORE: pop-overs and hyperbolic ads AFTER: No pop-overs, spam ads blocked But under the hood, it’s even more significant. BEFORE: 45 seconds to load AFTER: 6 seconds to load Look in the lower-right corner of each of those images. Load time without Pi-Hole was over 45 seconds. With it, the load time was 6 seconds. You may there are many pages like this, but the truth is web pages link to these sites all the time. Here's the statistics from my house on a typical day. How Does the Pi-Hole API Work? If you have Pi-Hole running, you get to the API by going to http://<your pi-hole url>/admin/api.php?summaryRaw. The result will look something like this: {”domains_being_blocked”:115897,”dns_queries_today”:284514,”ads_blocked_today”:17865,”ads_percentage_today”:6.279129,”unique_domains”:14761,”queries_forwarded”:216109,”queries_cached”:50540,”clients_ever_seen”:38,”unique_clients”:22,”dns_queries_all_types”:284514,”reply_NODATA”:20262,”reply_NXDOMAIN”:19114,”reply_CNAME”:16364,”reply_IP”:87029,”privacy_level”:0,”status”:”enabled,””gravity_last_updated”:{”file_exists”:true,”absolute”:1567323672,”relative”:{”days”:”3,””hours”:”09,””minutes”:”53”}} Let's format the JSON data so it looks a little prettier: The point is, once we have access to all that JSON-y goodness, it's almost trivial (using the Flex integration, which I discussed in this series) to collect and send into New Relic, to provide further insight into how your network is performing. At that point, you can start to include the information in graphs like this: Assuming you have the New Relic infrastructure agent installed on on any system on the network that can access your pihole (and once again, if you need help getting that set up, check out my earlier blog post here) you have relatively few steps to get up and running. First, the YAML file would look like this (you can also find it on the New Relic Flex GitHub repo in the examples folder). integrations: - name: nri-flex config: name: pihole_simple apis: - name: pihole_simple url: http://pi.hole/admin/api.php?summaryRaw&auth= #<your API Key Here> headers: accept: application/json remove_keys: - timestamp Next, the NRQL you'd need to set up two different charts are as follows: For the "Query Volume" chart: From pihole_simpleSample SELECT average(dns_queries_all_replies), average(dns_queries_today), average(queries_forwarded), average(queries_cached), average(dns_queries_all_types) TIMESERIES For the "Blocking Activity" chart: From pihole_simpleSample SELECT average(ads_blocked_today), average(domains_being_blocked) TIMESERIES This is, of course, only the start of the insights you can gain from your Pi-Hole server (and by extension, ANY device or service that has an API with endpoints that provide data). If you find additional use cases, feel free to reach out to me in the comments below, on social media, or when you see me at a conference or meet-up.

By Leon Adato

High Throughput vs. Low Latency in Data Writing: A Way to Have Both

This article is about how Apache Doris helps you import data and conduct Change Data Capture (CDC) from upstream databases like MySQL to Doris based on Flink streaming. But first of all, you might ask: What is Apache Doris and why would I bother to do so? Well, Apache Doris is an open-source real-time analytical data warehouse that supports both high-concurrency point queries and high-throughput complex analysis. It provides sub-second analytic query capabilities and comes in handy in multi-dimensional analysis, dashboarding, and other real-time data services. Overview How to perform end-to-end data synchronization within seconds How to ensure real-time data visibility How to smoothen the writing of massive small files How to ensure end-to-end Exactly-Once processing Real-Timeliness Stream Write The Flink-Doris Connector in Doris used to follow a "Cache and Batch Write" method for data ingestion. However, that requires a wise choice of batch size and batch write interval; otherwise things could go wrong. For example, if the batch size is too large, OOM errors could occur. On the other hand, frequent writes could lead to too many data versions generated. To avoid such troubles, Doris implements a Stream Write method, which works as follows: A Flink task, once started, asynchronously initiates a Stream Load HTTP request. The data is transmitted to Doris via the chunked transfer encoding mechanism of HTTP. The HTTP request ends at Checkpoint, which means the Stream Load task is completed. Meanwhile, the next Stream Load request will be asynchronously initiated. Repeat the above steps. Transaction Processing Quick Aggregation of Data Versions Highly concurrent writing of small files can generate too many data versions in Doris and slow down data queries. Thus, Doris has enhanced its data compaction capability in order to quickly aggregate data. Firstly, Doris introduced Quick Compaction. Specifically speaking, data compaction will be triggered once data versions increase. Meanwhile, by scanning the metadata of tablets, Doris can identify those tablets with too many data versions and conduct compaction correspondingly. Secondly, for the writing of small files, which happens in high concurrency and frequency, Doris implements Cumulative Compaction. It isolates these compaction tasks from the heavyweight Base Compaction from a scheduling perspective to avoid mutual influence between them. Last but not least, Doris adopts a tiered data aggregation method, which ensures that each aggregation only involves files of similar sizes. This greatly reduces the total number of aggregation tasks and the CPU usage of the system. Exactly-Once The Exactly-Once semantics means that the data will be processed once and only once. It prevents the data from getting reprocessed or lost even if the machine or application fails. Flink implements a 2PC protocol to realize the Exactly-Once semantics of Sink operators. Based on this, the Flink-Doris Connector in Doris implements Stream Load 2PC to deliver Exactly-Once processing. The details are as follows: A Flink task will initiate a Stream Load PreCommit request once it is started. Then, a transaction will be opened, and data will be continuously sent to Doris via the chunked mechanism of HTTP. The HTTP request ends at Checkpoint and the Stream Load is completed. The transaction status will be set to Pre-Committed. At this time, the data has been written to BE and become invisible to users. The Checkpoint initiates a request and changes the transaction status to Committed. After this, the data will become visible to users. In the case of Flink application failures, if the previous transaction is in Pre-Committed status, the Checkpoint will initiate a rollback request and change the transaction status to Aborted. Performance of Doris in High-Concurrency Scenarios Scenario Description Import data from Kafka using Flink. After ETL, use the Flink-Doris Connector for real-time data ingestion into Doris. Requirements The upstream data is written into Doris at a high frequency of 100,000 per second. To achieve real-time data visibility, the upstream and downstream data needs to be synchronized within around 5s. Flink Configurations Concurrency: 20 Checkpoint Interval: 5s Here's how Doris does it: Compaction Real-TimelinessAs the result shows, Doris manages to aggregate data quickly and keep the number of data versions in tablets below 50. Meanwhile, the Compaction Score remains stable. CPU UsageAfter optimizing the compaction strategy of small files, Doris reduces CPU usage by 25%. Query LatencyBy reducing the CPU usage and the number of data versions, Doris arranges the data more orderly and thus enables much lower query latency. Performance of Doris in Low-Latency Scenarios (High-Level Stress Test) Description Single-BE, single-tablet Stream Load stress test on the client side Data real-timeliness <1s Here are the Compaction Scores before and after optimization: Suggestions for Using Doris Low-Latency ScenariosAs for scenarios requiring real-time data visibility (such as data synchronization within seconds), the files in each ingestion are usually small in size. Thus, it is recommended to reduce cumulative_size_based_promotion_min_size_mbytefrom the default value of 64 to 8 (measured in MB). This can greatly improve the compaction performance. High-Concurrency ScenariosFor highly concurrent writing scenarios, it is recommended to reduce the frequency of Stream Load by increasing the Checkpoint interval to 5–10s. This not only increases the throughput of Flink tasks, but also reduces the generation of small files and thus avoids extra pressure on compaction. In addition, for scenarios with less strict requirements for real-timeliness (such as data synchronization within minutes), it is recommended to increase the Checkpoint interval to 5–10 minutes. In this way, the Flink-Doris Connector can still ensure data integrity via the 2PC+Checkpoint mechanism. ConclusionApache Doris realizes data real-timeliness by its Stream Write method, transaction processing capability, and aggregation of data versions. These techniques help it reduce memory and CPU usage, which enables lower latency. In addition, for data integrity and consistency, Doris implements Stream Load 2PC to guarantee that all data is processed exactly once. This is how Doris facilitates quick and safe data ingestion.

By Frank Z

Using AI To Optimize IoT at the Edge

As more companies combine Internet of Things (IoT) devices and edge computing capabilities, people are becoming increasingly curious about how they could use artificial intelligence (AI) to optimize those applications. Here are some thought-provoking possibilities. Improving IoT Sensor Inference Accuracy With Machine Learning Technology researchers are still in the early stages of investigating how to improve the performance of edge-deployed IoT sensors with machine learning. Some early applications include using sensors for image-classification tasks or those involving natural language processing. However, one example shows how people are making progress. Researchers at IMDEA Networks recognized that using IoT sensors for specific deep-learning tasks may mean the sensors cannot guarantee specific quality-of-service requirements, such as latency and inference accuracy. However, the people working on this project developed a machine learning algorithm called AMR² to help with this challenge. AMR² utilizes an edge computing infrastructure to make IoT sensor inferences more accurate while enabling faster responses and real-time analyses. Experiments suggested the algorithm improved inference accuracy by up to 40% compared to the results of basic scheduling tasks that did not use the algorithm. They found an efficient scheduling algorithm such as this one is essential for helping IoT sensors work properly when deployed at the edge. A project researcher pointed out that the AMR² algorithm could impact an execution delay if a developer used it for a service similar to Google Photos, which classifies images by the elements they include. A developer could deploy the algorithm to ensure the user does not notice such delays when using the app. Reducing Energy Usage of Connected Devices With AI at the Edge A 2023 study of chief financial officers at tech companies determined 80% expect revenue increases in the coming year. However, that’s arguably most likely to happen if employees understand customers’ needs and provide products or services accordingly. The manufacturers of many IoT devices intend for people to wear those products almost constantly. Some wearables detect if lone workers fall or become distressed or if people in physically demanding roles are becoming too tired and need to rest. In such cases, users must feel confident that their IoT devices will work reliably through their workdays and beyond. That’s one of the reasons why researchers explored how using AI at the edge could improve the energy efficiency of IoT devices deployed to study the effects of a sedentary lifestyle on health and how correct posture could improve outcomes. Any IoT device that captures data about how people live must collect data continuously, requiring few or no instances where information gathering stops because the device runs out of battery. In this case, subjects wore wireless devices powered by coin-cell batteries. Each of these gadgets had inertia sensors to collect accurate data about how much people moved throughout the day. However, the main problem was the batteries only lasted a few hours due to the large volume of data transmitted. For example, research showed a nine-channel motion sensor that reads 50 samples every second produces more than 100 MB of data daily. However, researchers recognized machine learning could enable the algorithms only to transfer critical data from edge-deployed IoT devices to smartphones or other devices that assist people in analyzing the information. They proceeded to use a pre-trained recurrent neural network and found the algorithm achieved real-time performance, improving the IoT devices’ functionality. Creating Opportunities for On-Device AI Training Edge computing advancements have opened opportunities to use smart devices in more places. For example, people have suggested deploying smart street lights that turn on and off in response to real-time traffic levels. Tech researchers and enthusiasts are also interested in the increased opportunities associated with AI training that happens directly on edge-deployed IoT devices. This approach could increase those products’ capabilities while reducing energy consumption and improving privacy. An MIT team studied the feasibility of training AI algorithms on intelligent edge devices. They tried several optimization techniques and came up with one that only required 157 KB of memory to train a machine-learning algorithm on a microcontroller. Other lightweight training methods typically require between 300-600 MB of memory, making this innovation a significant improvement. The researchers explained that any data generated for training stays on the device, reducing privacy concerns. They also suggested use cases where the training happens throughout normal use, such as if algorithms learn by what a person types on a smart keyboard. This approach had some undoubtedly impressive results. In one case, the team trained the algorithm for only 10 minutes, which was enough to allow it to detect people in images. This example shows optimization can go in both directions. Although the first two examples here focused on improving how IoT devices worked, this approach enhanced the AI training process. However, suppose developers train algorithms on IoT devices that will eventually use them to perform better. That’s a case where the approach mutually benefits AI algorithms and IoT-edge devices. How Will You Use AI to Improve How IoT-Edge Devices Work? These examples show some of the things researchers focused on when exploring how artificial intelligence could improve the functionality of IoT devices deployed at the edge. Let them provide valuable insights and inspiration about how you might get similar results. It’s almost always best to start with a clearly defined problem you want to solve. Then, start exploring how technology and innovative approaches could help meet that goal.

By Devin Partida

UUID: Coordination-Free Unique Keys

Let’s build an IoT application with weather sensors deployed around the globe. The sensors will collect data and we store the data along with the IDs of the sensors. We’ll run multiple database instances, and the sensors will write to the geographically closest database. All databases will regularly exchange data, so all the databases will eventually have data from all the sensors. We need each sensor to have a globally unique ID. How can we achieve it? For example, we could run a service assigning sensor IDs as a part of the sensor installation procedure. It would mean additional architectural complexity, but it's doable. Sensor IDs are immutable, so each sensor needs to talk to the ID service only once - right after the installation. That’s not too bad. What if we need to store a unique ID for each data reading? Hitting the centralized ID service whenever we need to store data is not an option. That would stress the ID service too much, and when the ID service is unavailable, no sensor could write any data. What are the possible solutions? In the simplest case, each sensor could talk to the remote ID service and reserve a block of IDs it could then assign locally without further coordination. When it exhausts the block, it asks the ID service for a new one. This strategy would reduce the load on the ID service, and sensors could function even when the ID service is temporarily unavailable. We could also generate local reading IDs and prefix them with our unique immutable sensor ID. We could also be smart and use fancy ID algorithms like FlakeIDs. The strategies mentioned aim to minimize the need for coordination while ensuring that the IDs are unique globally. The goal is to generate unique IDs without any coordination at all. This is what we call coordination-free unique IDs. UUID Enters the Scene Flip a coin 128 times and write down 1 for each head and 0 for each tail. This gives you a sequence of 128 1s and 0s, or 128 bits of randomness. That’s a space large enough that the probability of generating the same sequence twice is so extremely low that you can rule it out for practical purposes. How is that related to UUIDs? If you have ever seen a UUID then you know they look similar to this: 420cd09a-4d56-4749-acc2-40b2e8aa8c42. This format is just a textual representation of 128 bits. How does it work? The UUID string has 36 characters in total. If we remove the 4 dashes, which are there just to make it a bit more human-readable, we are left with 32 hexadecimal digits: 0-F. Each digit represents 4 bits and 32 * 4 bits = 128 bits. So UUIDs are 128-bit values. We often represent them as strings, but that's just a convenience. UUID has been explicitly designed to be unique and generated without coordination. When you have a good random generator, 128 random bits are enough to practically guarantee uniqueness. At the same time, 128 bits are not too much, so UUIDs do not occupy too much space when stored. UUID Versions There are multiple versions of UUIDs. Versions 1-5 are defined in RFC 4122 and they are the most widely used. Versions 6 - 8 are currently in draft status and might be approved in the future. Let's take a brief look at the different versions. Version 1 Version 1 is generated by using a MAC address and time as inputs. The MAC address is used to ensure uniqueness across multiple machines. The time ensures uniqueness across multiple processes on the same machine. Using the MAC means that generated UUIDs can be tracked to a specific machine. This can be useful occasionally, but it might not be desirable in other cases, as a MAC address can be considered private information. Interestingly enough, the time portion is not based on the usual Unix epoch, but it uses a count of 100-nanosecond intervals since 00:000:00.00 on the 15th of October 1582. What is special about October 1582? It's the Gregorian calendar reform. See version 7 for a UUID with a standard Unix epoch. Version 2 Version 2 is similar to version 1 but adds a local domain ID to the UUID. It's not widely used. Versions 3 and 5 These versions use a hash function to generate the UUID. The hash function is seeded with a namespace UUID and a name. The namespace UUID is used to ensure uniqueness across multiple namespaces. The name is used to ensure uniqueness within a namespace. Version 3 uses MD5 as a hash function, while version 5 uses SHA-1. SHA-1 generates 160 bits, so the digest is truncated to 128 bits. Version 4 Version 4 UUID is probably the most popular one. It relies solely on a random generator to generate UUIDs, similar to the coin flip example above. This means that the quality of the random generator is critical. Version 6 Version 6 is similar to Version 1 but has a different ordering of bytes. It encodes the time from the most significant to the least significant. This allows sorting UUIDs correctly by time when you sort just bytes representing the UUIDs. Version 7 Version 7 uses a 48-bit timestamp and random data. Unlike versions 1, 2, or 6, it uses a standard Unix epoch in milliseconds. It also uses a random generator instead of a MAC address. Version 8 Version 8 is meant to be used for experimental and private use. Security Considerations UUIDs are designed to be unique, but they are not designed to be secret. What's the difference? If you generate a UUID then you can assume it's different from any other UUID generated before or after, but you should not treat them as a password or a secret session identifier. This is what RFC 4122 says about this: Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example. A predictable random number source will exacerbate the situation. UUID in QuestDB UUIDs are popular synthetic IDs because they can be generated without any coordination and do not use too much space. QuestDB users often store UUIDs, but until recently, QuestDB did not have first-class support. Most users stored UUIDs in a string column. It makes sense because as we have seen above UUIDs have a canonical textual representation. Storing UUIDs in a string column is possible, but it's inefficient. Let's do some math: We already know each UUID has 128 bits, that's 16 bytes. The canonical textual representation of UUID has 36 characters. QuestDB uses UTF-16 encoding for strings, so each ASCII character uses 2 bytes. There is also a fixed cost of 4 bytes per string stored. So it takes 36 * 2 + 4 = 76 bytes to store a single UUID which contains just 16 bytes of information! It's not just wasting disk space. QuestDB must read these bytes when evaluating a SQL predicate, joining tables, or calculating an aggregation. Thus storing UUIDs as strings also makes your queries slower! That's why QuestDB 6.7 implemented UUID as a first-class data type. This allows user applications to declare a column as UUID and then each UUID stored will use only 16 bytes. Thanks to this, SQL queries will be faster. Demo time The demo creates tables occupying just under 100 GB of disk space. Make sure you have enough disk space available. You might also need to increase the query timeout via the query.timeout.sec property. See Configuration for more details. Alternatively, you can change the long_sequence() function to create a smaller number of rows. Let’s create a table with a single string column and populate it with 1 billion random UUIDs. The column is defined as the string type, so the UUIDs will be stored as strings. SQL CREATE TABLE tab_s (s string); INSERT INTO tab_s SELECT rnd_uuid4() FROM long_sequence(1000000000); Let’s try to query it: SQL SELECT * FROM tab_s WHERE s = 'ab632aba-be36-43e5-a4a0-4895e9cd3f0d'; It’s taking around 2.2s. It’s not terrible given it’s a full-table scan over one billion strings, but we can do better! How much better? Let’s see. Create a new table with a UUID column: SQL CREATE TABLE tab_u (u uuid); Populate it with UUID values from the first table: SQL INSERT INTO tab_u SELECT * FROM tab_s; The newly created table has the same values as the first table, but the column is defined as UUID instead of string, so it eliminates the waste we discussed above. Let’s see how the predicate performs now: SQL SELECT * FROM tab_u WHERE u = 'ab632aba-be36-43e5-a4a0-4895e9cd3f0d'; This query takes around 380ms on my test box. That’s almost 6x better than the original 2.2 seconds! Speed is the key to any real-time analysis so this is certainly important. Let’s check disk space. The du command shows the space used by each table. First, the table with strings: Shell $ du -h 79G ./default 79G . The table with UUID: Shell $ du -h 15G ./default 15G . Declaring the column as UUID saved 64GB of disk space! Using UUID optimizes query performance and is cost-effective. Last but not least, predicates on UUID values will become even faster in future QuestDB versions as we are looking at how to vectorize them by using the SIMD instructions! Conclusion We use UUIDs to generate globally unique IDs without any coordination. They are 128 bits long so they do not use too much space. This makes them suitable for distributed applications, IoT, cryptocurrencies, or decentralized finance. When your application stores UUIDs, tell your database it’s a UUID, do not store them in a string column. You will save disk space and CPU cycles.

By Jaromir Hamala

How To Test IoT Security

Though the Internet of Things (IoT) has redefined our lives and brought a lot of benefits, it has a large attack surface area and is not safe until it is secure. IoT devices are an easy target for cybercriminals and hackers if not properly secured. You may have serious problems with financial and confidential data being invaded, stolen, or encrypted. It is difficult to spot and discuss risks for organizations, let alone build a comprehensive methodology for dealing with them, without practical knowledge of what IoT security is and testing it. Realizing the security threats and how to avoid them is the first step, as Internet of Things solutions require significantly more testing than before. Integrated security is frequently lacking when it comes to introducing new features and products to the market. What Is IoT Security Testing? IoT security testing is the practice of evaluating cloud-connected devices and networks to reveal security flaws and prevent devices from being hacked and compromised by a third party. The biggest IoT security risks and challenges can be addressed through a focused approach with the most critical IoT vulnerabilities. Most Critical IoT Security Vulnerabilities There are typical issues in security analysis faced by organizations that are missed even by experienced companies. Adequate testing Internet of Things (IoT) security in networks and devices is required, as any hack into the system can bring a business to a standstill, leading to a loss in revenue and customer loyalty. The top ten common vulnerabilities are as follows: 1. Weak Easy-to-Guess Passwords Absurdly simple and short passwords that put personal data at risk are among the primary IoT security risks and vulnerabilities for most cloud-connected devices and their owners. Hackers can co-opt multiple devices with a single guessable password, jeopardizing the entire network. 2. Insecure Ecosystem Interfaces Insufficient encryption and verification of the user’s identity or access rights in the ecosystem architecture, which is software, hardware, network, and interfaces outside of the device, enable the devices and associated components to get infected by malware. Any element in the broad network of connected technologies is a potential source of risk. 3. Insecure Network Services The services running on the device should be given special attention, particularly those that are open to the Internet and have a high risk of illegal remote control. Do not keep ports open, update protocols, and ban any unusual traffic. 4. Outdated Components Outdated software elements or frameworks make a device unprotected from cyberattacks. They enable third parties to interfere with the performance of the gadgets, operating them remotely or expanding the attack surface for the organization. 5. Insecure Data Transfer/Storage The more devices are connected to the network, the higher the level of data storage/exchange should be. A lack of secure encoding in sensitive data, whether it is at rest or transferred, can be a failure for the whole system. 6. Bad Device Management Bad device management happens because of a poor perception of and visibility into the network. Organizations have a bunch of different devices that they do not even know about, which are easy entry points for attackers. IoT developers are simply unprepared in terms of proper planning, implementation, and management tools. 7. Poor Secure Update Mechanism The ability to securely update the software, which is the core of any IoT device, reduces the chances of it being compromised. The gadget becomes vulnerable every time cybercriminals discover a weak point in security. Similarly, if it is not fixed with regular updates, or if there are no regular notifications of security-related changes, it can become compromised over time. 8. Inadequate Privacy Protection Personal information is gathered and stored in larger amounts on IoT devices than on smartphones. In case of improper access, there is always a threat of your information being exposed. It is a major privacy concern because most Internet of Things technologies are somehow related to monitoring and controlling gadgets at home, which can have serious consequences later. 9. Poor Physical Hardening Physical hardening is one of the major aspects of high security IoT devices since they are a cloud computing technology that operates without human intervention. Many of them are intended to be installed in public spaces (instead of private homes). As a result, they are created in a basic manner, with no additional level of physical security. 10. Insecure Default Settings Some IoT devices come with default settings that cannot be modified, or there is a lack of alternatives for operators when it comes to security adjustments. The initial configuration should be modifiable. Default settings that are invariant across multiple devices are insecure. Once guessed, they are used to hack into other devices. How To Protect IoT Systems and Devices Easy-to-use gadgets with little regard for data privacy make IoT security on smart devices tricky. The software interfaces are unsafe, and data storage/transfer is not sufficiently encrypted. Here are the steps to keep networks and systems safe and secure: Introduce IoT security during the design phase: IoT security strategy has the greatest value if it is introduced from the very beginning, the design stage. Most concerns and threats that have risks to an Internet of Things solution may be avoided by identifying them during preparation and planning. Network security: Since networks pose the risk of any IoT device being remotely controlled, they play a critical role in cyber protection strategy. The network stability is ensured by port security, animal ware, firewall, and banned IP addresses that are not usually used by a user. API security: Sophisticated businesses and websites use APIs to connect services, transfer data, and integrate various types of information in one place, making them a target for hackers. A hacked API can result in the disclosure of confidential information. That is why only approved apps and devices should be permitted to send requests and responses with APIs. Segmentation: It is important to follow segmentation for a corporate network if multiple IoT devices are connecting directly to the web. Each of the devices should use its small local network (segment) with limited access to the main network. Security gateways: Serve as an additional level in security IoT infrastructure before sending data produced by a device out to the Internet. They help track and analyze incoming and outgoing traffic, ensuring someone else cannot directly reach the gadget. Software updates: Users should be able to set changes to software and devices by updating them over a network connection or through automation. Improved software means incorporating new features as well as assisting in identifying and eliminating security defects in the early stages. Integrating teams: Many people are involved in the IoT development process. They are equally responsible for ensuring the product’s security throughout the full lifecycle. It is preferable to have IoT developers get together with security experts to share guidance and necessary security controls right from the design stage. Our team consists of cross-functional experts who are involved from the beginning to the end of the project. We support clients with developing digital strategies based on the requirements analysis, planning an IoT solution, and performing IoT security testing services so they can launch a glitch-free Internet of Things product. Conclusion To create trustworthy devices and protect them from cyber threats, you have to maintain a defensive and proactive security strategy throughout the entire development cycle. I hope you take away some helpful tips and tricks that will help you test your IoT security. If you have any questions, feel free to comment below.

By Anna Smith

What Is IoT Gateway? Is It Important

An IoT (Internet of Things) gateway is a device that acts as a bridge between connected IoT devices and other networks, such as the Internet. It provides a centralized platform for managing and processing data from multiple IoT devices and securely transmitting that data to the cloud or other systems for analysis, storage, and further processing. The IoT gateway can perform various functions, such as data aggregation, protocol translation, security management, and device management. An IoT gateway builds connections to connected IoT devices through various communication protocols, such as Wi-Fi, Ethernet, Zigbee, Z-Wave, or others. The gateway uses these protocols to communicate with the IoT devices and receive data from them. The gateway can also establish connections to other networks, such as the Internet, through Wi-Fi or Ethernet, to transmit the data it collects from IoT devices to the cloud or other systems for further processing. To ensure the secure transmission of data, the IoT gateway typically employs encryption and authentication methods. Additionally, the gateway can be configured to perform data processing and storage locally to reduce the amount of data transmitted to the cloud or other systems. Why IoT Gateways Are Important IoT gateways are important for several reasons: Connectivity: IoT gateways provide a central platform for connecting and communicating with multiple IoT devices, which may use different communication protocols. The gateway acts as a bridge, allowing these devices to communicate with each other and with other systems, such as the cloud or a local network. Data processing: IoT gateways can perform data processing tasks such as data aggregation, protocol translation, data filtering, and data compression, reducing the amount of data transmitted to the cloud and improving the efficiency of the IoT network. Security: IoT gateways provide a secure connection between IoT devices and other systems, using encryption and authentication methods to protect transmitted data. This ensures the privacy and security of the IoT network and the connected devices. Device management: IoT gateways can manage and control connected IoT devices, updating their firmware, configuring their settings, and monitoring their performance. This simplifies the management of a large number of connected devices and reduces the maintenance overhead. Cost savings: By performing data processing and storage locally, IoT gateways can reduce the amount of data transmitted to the cloud, reducing the cost of data storage and transmission. Overall, the IoT gateway is an essential component of an IoT network, providing a centralized platform for connecting, managing, and processing data from connected devices. How Does an IoT Gateway Work? An IoT gateway works by serving as a communication hub between IoT devices and other systems, such as the cloud or a local network. It acts as a bridge, connecting devices that use different communication protocols and enabling them to communicate with each other. The following are the key steps involved in the working of an IoT gateway: Data collection: The IoT gateway collects data from the connected IoT devices using communication protocols such as Wi-Fi, Ethernet, Zigbee, Z-Wave, or others. Data processing: The gateway can perform data processing tasks such as data aggregation, protocol translation, data filtering, and data compression, among others. Data transmission: The processed data is transmitted to the cloud or other systems for further analysis and storage. Security: The IoT gateway employs security measures, such as encryption and authentication, to protect the transmitted data and ensure secure communication between the devices and the cloud or other systems. Device management: The IoT gateway can manage and control connected IoT devices, updating their firmware, configuring their settings, and monitoring their performance. Overall, the IoT gateway plays a crucial role in the functioning of an IoT network, enabling connected devices to communicate with each other and with other systems and providing a platform for data processing and management. How Many Types of IoT Are There? IoT gateways come in different types based on their form factor, connectivity options, processing capabilities, and other factors. Some of the common types of IoT gateways are: Industrial IoT gateways: These gateways are designed for industrial and commercial applications, such as factory automation and building management systems. They are rugged, have multiple connectivity options, and can operate in harsh environments. Home automation gateways: These gateways are designed for use in residential environments to control and manage connected home devices, such as smart locks, lighting systems, and thermostats. Wireless IoT gateways: These gateways are designed for wireless communication with connected devices, using protocols such as Wi-Fi, Zigbee, Z-Wave, or others. They provide a low-power, low-cost solution for connecting devices in a small area. Embedded IoT gateways: These gateways are integrated into the connected devices themselves, providing a compact and integrated solution for small IoT networks. Multi-protocol IoT gateways: These gateways can communicate with devices using multiple communication protocols, such as Wi-Fi, Ethernet, Zigbee, Z-Wave, and others. They provide a flexible solution for connecting a variety of devices to a network. Cloud-based IoT gateways: These gateways are hosted in the cloud, providing a remote access solution for managing and processing data from connected devices. Each type of IoT gateway has its own advantages and disadvantages, and the choice of the right gateway depends on the specific requirements of the IoT network and the connected devices.

By Paridhi Dhamani

IoT

DZone's Featured IoT Resources

Top IoT Experts

The Latest IoT Topics