1 files changed, 114 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..2bcca2e
--- /dev/null
+++ b/README.md
@@ -0,0 +1,114 @@
+# PicoStream
+
+PicoStream is a high-performance Python application for streaming data from a PicoScope to an HDF5 file, with an optional, decoupled live visualization. It is designed for robust, high-speed data logging where data integrity is critical.
+
+## Features
+
+- **Robust Architecture**: Separates data acquisition from disk I/O using a producer-consumer pattern with a large, shared memory buffer pool to prevent data loss.
+- **Zero-Risk Live Plotting**: The plotter reads from the HDF5 file, not the live data stream. This ensures that a slow or crashing GUI cannot interfere with data acquisition.
+- **Efficient Visualization**: Uses `pyqtgraph` and a Numba-accelerated min-max decimation algorithm to display large datasets with minimal CPU impact.
+- **Flexible Data Reading**: The included `PicoStreamReader` class allows for easy post-processing, including on-the-fly decimation ('mean' or 'min_max').
+
+## Installation
+
+1. `pip install -e .`
+
+2. Install the official PicoSDK from Pico Technology. Linux can take some additional work, see the AUR wiki for details.
+
+## Usage
+
+The primary way to use the package is through the `picostream` command-line tool.
+
+### Acquisition
+
+The following command starts an acquisition at 62.5 MS/s and saves the data to `my_data.hdf5` with a live plot.
+
+```bash
+picostream -s 62.5 -o my_data.hdf5 --plot
+```
+
+Run `picostream --help` for a full list of options.
+
+### Viewing an Existing File
+
+The plotter can be run as a standalone tool to view any compatible HDF5 file.
+
+```bash
+python -m picostream.dfplot /path/to/your/data.hdf5
+```
+
+## Data Analysis with `PicoStreamReader`
+
+The output HDF5 file contains raw ADC counts and metadata. The `PicoStreamReader` class in `picostream.reader` is the recommended way to read and process this data.
+
+### Example: Processing a File in Chunks
+
+Here is an example of how to use `PicoStreamReader` to iterate through a large file and perform analysis without loading the entire dataset into memory.
+
+```python
+import numpy as np
+from picostream.reader import PicoStreamReader
+
+# Use the reader as a context manager
+with PicoStreamReader('my_data.hdf5') as reader:
+    # Metadata is available as attributes after opening
+    sample_rate_sps = 1e9 / reader.sample_interval_ns
+    print(f"File contains {reader.num_samples:,} samples.")
+    print(f"Sample rate: {sample_rate_sps / 1e6:.2f} MS/s")
+
+    # Example: Iterate through the file with 10x decimation
+    print("\nProcessing data with 10x 'min_max' decimation...")
+    for times, voltages_mv in reader.get_block_iter(
+        chunk_size=10_000_000, decimation_factor=10, decimation_mode='min_max'
+    ):
+        # The 'times' and 'voltages_mv' arrays are now decimated.
+        # Process the smaller data chunk here.
+        print(f"  - Processed a chunk of {voltages_mv.size} decimated points.")
+        
+    print("\nFinished processing.")
+```
+
+## API Reference: `PicoStreamReader`
+
+The `PicoStreamReader` provides a simple and efficient interface for accessing data.
+
+### Initialization
+
+#### `__init__(self, hdf5_path: str)`
+Initializes the reader. The file is opened and metadata is read when the object is used as a context manager.
+
+### Metadata Attributes
+
+These attributes are populated from the HDF5 file's metadata when the reader is opened.
+
+-   `num_samples: int`: Total number of raw samples in the dataset.
+-   `sample_interval_ns: float`: The time interval between samples in nanoseconds.
+-   `voltage_range_v: float`: The configured voltage range (e.g., `20.0` for ±20V).
+-   `max_adc_val: int`: The maximum ADC count value (e.g., 32767).
+-   `analog_offset_v: float`: The configured analog offset in Volts.
+-   `downsample_mode: str`: The hardware downsampling mode used during acquisition (`'average'` or `'aggregate'`).
+-   `hardware_downsample_ratio: int`: The hardware downsampling ratio used.
+
+### Data Access Methods
+
+#### `get_block_iter(self, chunk_size: int = 1_000_000, decimation_factor: int = 1, decimation_mode: str = "mean") -> Generator`
+Returns a generator that yields data blocks as `(times, voltages)` tuples for the entire dataset. This is the recommended method for processing large files.
+-   `chunk_size`: The number of *raw* samples to read from the file for each chunk.
+-   `decimation_factor`: The factor by which to decimate the data.
+-   `decimation_mode`: The decimation method (`'mean'` or `'min_max'`).
+
+#### `get_next_block(self, chunk_size: int, decimation_factor: int = 1, decimation_mode: str = "mean") -> Tuple | None`
+Retrieves the next sequential block of data. Returns `None` when the end of the file is reached. Use `reset()` to start over.
+
+#### `get_block(self, size: int, start: int = 0, decimation_factor: int = 1, decimation_mode: str = "mean") -> Tuple`
+Retrieves a specific block of data from the file.
+-   `size`: The number of *raw* samples to retrieve.
+-   `start`: The starting sample index.
+
+#### `reset(self) -> None`
+Resets the internal counter for `get_next_block()` to the beginning of the file.
+
+
+## Acknowledgements
+
+This package began as a fork of [JoshHarris2108/pico_streaming](https://github.com/JoshHarris2108/pico_streaming) (which is unlicensed). I acknowledge and appreciate Josh's original idea/architecture.