Interactive Building Data Analyzer

About This Tool

This Interactive Building Performance Analyzer is built by Max Berggren (Data Scientist and Energy Engineer). The primary goal is to provide the most fair and transparent comparison tool available for accurately determining the real impact of upgrades or control strategy changes made to your building systems.

Understanding the true effect of an investment, like new windows or an optimized control algorithm, can be challenging due to many confounding factors, especially weather variations. This tool aims to cut through that noise by normalizing data and applying clear, pedagogical analysis methods.

This tool is the current workhorse powering all impact analytics on building energy performance conducted by Myrspoven AB in Stockholm. We are making it open and available because we believe this methodology offers a significantly more robust and accurate approach compared to traditional methods like 'heat load signatures' or 'degree-day corrections'.

Open Source and Community

This tool is fully open source! We believe in transparency and collaborative improvement. Contributions, suggestions, and feedback are highly encouraged. You can find the project and contribute on GitHub

Changelog

2025-01-05 - Temperature Analysis Enhancements

New Feature: Added "Average of signals..." option in temperature analysis, allowing users to analyze the combined effect of multiple signals by calculating their arithmetic mean for each data point.
Improved Visualization: Fixed error bar (whisker) styling for unreliable data bins - they now properly appear dimmed with matching colors and opacity to their corresponding bars.
Better Data Separation: Reliable and unreliable data are now displayed as separate traces with consistent styling, making it clearer which temperature bins contribute to the analysis calculations.
Enhanced User Experience: Multiple signal selection is intuitive with clear instructions, and the interface dynamically updates to show what signals are being averaged.

Welcome to the Interactive Building Performance Analyzer!

This tool helps you analyze time-series data from your building systems to understand performance, particularly the impact of control strategies.

How it works:

Upload Your Data: Provide a CSV or Excel file containing your building's operational data (timestamps, sensor readings, control signals).
Map Columns: Tell the tool which columns in your file correspond to key data points like outdoor temperature, the main control signal you want to analyze, and other relevant sensor signals.
Configure Analysis: Set filters for date ranges, outdoor temperature, hour of the day, and define how the "ON" and "OFF" states of your control signal are determined.
View Results: Explore interactive charts and detailed textual explanations that quantify the impact of your control strategy under various conditions.

Motivation

Analyzing building performance data, especially when comparing different time periods like months or years, can be tricky. Why? Because the weather can be vastly different! One year might be significantly colder or warmer than another, sometimes by as much as 30% in terms of heating or cooling demand.

Imagine you've invested in new, energy-efficient windows for your building. You'd expect your energy bills to go down. However, if the following year turns out to be much colder than the previous one, you might be surprised to see higher energy consumption despite the new windows. This doesn't necessarily mean the windows aren't working; the weather variation is masking their positive impact.

This tool helps you overcome this challenge. By normalizing data against outdoor temperature and other relevant factors, it allows for a fairer and more accurate comparison of your building's performance and the effectiveness of any changes or control strategies you've implemented. It helps you see the true impact, beyond the weather.

This tool is designed to be highly pedagogical. Each step and analysis method is explained in detail, helping you understand not just the "what" but also the "how" and "why" of the analysis.

Click "Next" or select a step from the navigation above to begin.

1. Upload Your Data

Select your CSV or Excel file:

Data Format Requirements:

Your file should be a standard CSV (Comma Separated Values) or Excel (XLSX, XLS) file.
The first row should contain column headers (e.g., "Timestamp", "Outdoor Temp", "Control_Signal_X", "Sensor_Y").
Ensure your timestamp column is consistently formatted (e.g., "YYYY-MM-DD HH:MM:SS" or ISO 8601). You'll specify this in the next step.
Data should be time-series, typically with hourly or sub-hourly resolution.

Your Privacy Matters:

Your data never leaves your computer. This tool runs entirely in your web browser.
All analysis is performed locally on your machine.
No data is uploaded to any server, collected, or stored by us. Your information remains private.

Don't have a file? Try the example:

2. Map Your Data Columns

Please tell us which columns from your uploaded file correspond to the following key data types. The dropdowns below are populated with the headers from your file.

A. Timestamp Column

Select the column containing timestamps: Specify timestamp format (if not automatically detected, e.g., YYYY-MM-DD HH:mm:ss or MM/DD/YYYY hh:mm A). Leave blank to attempt auto-detection.

Why this is important: Accurate time information is fundamental for any time-series analysis. We need to know when each data point was recorded.

Example: If your data looks like 2023-10-26 14:30:00, 15.5, 1, ..., and "2023-10-26 14:30:00" is in a column named "ReadingTime", you'd select "ReadingTime" here.

B. Outdoor Temperature Column

Select the column for outdoor air temperature (typically in °C):

Why this is important: Building energy consumption and system performance are heavily influenced by outdoor temperature. This allows us to normalize comparisons, ensuring we're comparing system states under similar external conditions.

Example: A column named "OAT" or "ExternalTemp" containing values like 10.2, 15.7, etc.

C. Primary Control Signal

Select the main control signal column you want to analyze (e.g., Primary Control Signal, AI_Enable):

Why this is important: This is the signal whose impact you want to assess. The analysis will compare system behavior when this signal is "ON" versus "OFF".

Example: A column named "AI_Mode" with values 1 (for ON) and 0 (for OFF), or a specific system's status signal.

D. Signals for Analysis

Select one or more sensor/signal columns to analyze (e.g., heating circuit temp, AHU supply temp, energy meter readings):

Why this is important: These are the dependent variables. We'll examine how their values change when the Primary Control Signal (C) is ON versus OFF, normalized by Outdoor Temperature (B).

Example: Columns like "Supply_Air_Temp_AHU1", "Heating_Valve_Position", "CHW_Flow_Rate", "Electricity_Meter_kWh". You can select multiple.

3. Configure Analysis Parameters

Control Signal Definition

Define how "ON" and "OFF" states are determined for your selected Primary Control Signal (N/A).

Control Logic:

ON State Lower Threshold: ON State Upper Threshold:

OFF State Lower Threshold: OFF State Upper Threshold:

Threshold-based Logic: For a numeric control signal (e.g., where 1 = ON, 0 = OFF, or a percentage activation), you define ranges.

ON State Range: Values within this range (inclusive) are considered "ON". Example: If your signal is 0-1, ON might be 0.9 to 1.0.
OFF State Range: Values within this range (inclusive) are considered "OFF". Example: 0.0 to 0.2.

The script uses these thresholds to classify each data point based on the *daily average* value of the control signal. All hours of a day where the daily average falls into the "ON" range are considered "ON" hours for the analysis (and similarly for "OFF").

Split by Date Ranges Logic: Manually define periods when the control was ON and when it was OFF. This is useful if the control isn't a simple signal in your data but was enabled/disabled based on a schedule.

Date Range Selection

ON State Start Date: ON State End Date: OFF State Start Date: OFF State End Date:

Specify the overall date periods for your "ON state" data and "OFF state" data. The analysis will only consider data within these respective periods.

Example: You might have run an AI optimization (ON state) from 2023-01-01 to 2023-06-30, and want to compare it to a baseline period (OFF state) from 2022-01-01 to 2022-06-30.

Outdoor Temperature Filter

Filter by outdoor temperature (°C):

Filter the data to include only points where the outdoor temperature was within the selected range. This applies to all signals and helps ensure comparisons are made under relevant conditions.

Example: If you're analyzing heating, you might filter for temperatures below 15°C. If analyzing cooling, above 20°C.

Hour of Day Filter

Filter by hour of day (0-24): 0 - 24

Filter data to specific hours of the day. This is useful for analyzing specific operational periods, like occupied hours (e.g., 8-17) or nighttime setbacks.

Example: To analyze performance only during working hours, you might set this to 8:00 - 18:00.

4. Analysis Results

This section will display the various analyses performed on your data based on the configurations you've set. Ensure you have completed steps 1-3.

Initial Data Overview & ON/OFF Distribution

Control = ON (Hours)

-

- %

Control = OFF (Hours)

-

- %

This section provides an initial overview of your data after filtering and control state assignment.

ON/OFF Sample Distribution Chart: Shows the count of data samples classified as "ON" and "OFF" for each day, after applying your initial date range and control signal definition filters. It helps you visualize the prevalence of ON vs. OFF states over time in your filtered dataset before temperature or hour filtering is applied.

Control Signal Counts: These metrics show the number of hours your system was in the "ON" state versus the "OFF" state, based on all applied filters (date ranges, control definition, temperature, and hour of day).

Methodology (Daily ON/OFF Chart):

The tool first takes the data within your specified "ON state period" and "OFF state period".
It then applies your "Control Signal Definition" (thresholds or date-based split) to classify data points.
- If threshold-based: For each day, it calculates the average value of your Primary Control Signal. If this daily average falls within your defined "ON threshold range", all hours of that day are marked as "ON". If it falls within the "OFF threshold range" (and isn't already ON), all hours are marked "OFF".
- If date-based split: Data points within the "ON state date range" are marked "ON", and those in the "OFF state date range" are marked "OFF".
The bar chart then counts how many hourly samples end up in the "ON" state (orange) and "OFF" state (grey) for each calendar day.

Methodology (Control Signal Counts):

The full dataset is filtered by your selected date ranges for ON and OFF periods.
The Primary Control Signal is evaluated based on your chosen definition (thresholds on daily averages or date-based split) to assign an initial "control_state" (1 for ON, 0 for OFF, or NA if neither).
The Outdoor Temperature filter is applied.
The Hour of Day filter is applied.
The remaining data points are counted:
- Control = ON: Total hours where `control_state` is 1 after all filters.
- Control = OFF: Total hours where `control_state` is 0 after all filters.

Use Case for Energy Engineers: Quickly verify if the control strategy was active as expected. Understand the proportion of time the control strategy was active versus inactive within the specific conditions (temperature, time of day) you're analyzing. This is crucial context for interpreting subsequent impact analyses.

Outdoor Temperature Normalized Analysis for

Select Signal for Temperature Analysis:

Apply Affinity Law Correction (for Fan/Pump Pressure signals)

Normalize set temperatures for heating circuits to 20ºC

This is a core analysis that compares the selected signal's behavior during "ON" vs. "OFF" control states, normalized by outdoor temperature. This ensures fair comparisons by only looking at data under similar weather conditions.

Methodology:

Data Preparation:
- The data is filtered based on all your selections (date ranges, control definition, outdoor temperature range, hour of day range).
- The chosen "Signal for Analysis" (e.g., Supply Air Temperature) is isolated.
- Average of Signals: When "Average of signals..." is selected, the tool calculates the arithmetic mean of the selected signals for each data point. Only data points where at least one of the selected signals has a valid numeric value are included, and the average is calculated using only the valid values for that timestamp.
Temperature Binning (i)Grouping data into temperature ranges (e.g., 0-2°C, 2-4°C) to compare ON/OFF states under similar conditions.:
To compare apples to apples, we group data into "temperature bins". For example, we might use 2°C wide bins: 0-2°C, 2-4°C, 4-6°C, and so on, covering the range of your filtered outdoor temperature data.

Example: If an ON data point occurred at 3.1°C and an OFF data point at 3.5°C, both would fall into the 2-4°C bin.

Statistics per Bin:

Within each temperature bin, we calculate statistics for the "Signal for Analysis" separately for "ON" periods and "OFF" periods:

Average (Mean) Value: The central tendency.
Sample Count: How many data points fall into this bin for ON and OFF states.
95% Data Range (Percentiles): The range covering 95% of the data points (specifically, the 2.5th to 97.5th percentiles). This gives an idea of the data spread.

Example Table for a "Supply Air Temp" signal:

Temp Bin	State	Avg Temp	Samples	2.5th Pctl	97.5th Pctl
2-4°C	ON	18.5°C	120	17.0°C	20.0°C
2-4°C	OFF	20.1°C	150	18.5°C	21.5°C

Visualization:
A bar chart displays the average "ON" value and average "OFF" value for each temperature bin side-by-side. Error bars often show the 95% data range.

Bars for bins with few samples (e.g., fewer than 10 hourly samples) might be faded or marked as less reliable, as averages from small sample sizes can be misleading.
Impact Metrics:
- Simple Average Difference: For each temperature bin that has sufficient samples (at least 10 hours) for *both* the ON state and the OFF state (i.e., a reliable overlapping bin), calculate `Difference = Avg_ON_in_bin - Avg_OFF_in_bin`. The "Simple Average Difference" is the average of these individual differences across all such reliable overlapping bins.
  Example: If for reliable overlapping bins 0-2°C, 2-4°C, 4-6°C the differences are -1.0°C, -1.2°C, -0.8°C, the simple average diff is `(-1.0 - 1.2 - 0.8) / 3 = -1.0°C`.
- Simple Percentage Difference: `(Simple Average Difference / Grand_Average_OFF_Value) * 100%`. The Grand_Average_OFF_Value is the average of all Avg_OFF values from the same reliable overlapping bins used for the Simple Average Difference.
  Example: If Grand_Average_OFF_Value (from reliable overlapping bins) is 20°C, then `(-1.0°C / 20°C) * 100% = -5.0%`.
- Temperature-Occurrence-Weighted Difference (i)Gives more importance to differences observed at temperatures that occur more frequently in your overall dataset.: This metric also only considers temperature bins that are reliable for both ON and OFF states. It accounts for how often each such temperature bin actually occurs in your *entire filtered dataset* (both ON and OFF periods). Bins that occur more frequently get a higher "weight".
  Calculation Sketch:
  1. Calculate occurrence frequency (weight) of each temp bin in the total dataset. E.g., 2-4°C occurs 20% of the time, 4-6°C occurs 30%.
  2. For each reliable overlapping bin (sufficient samples for both ON and OFF): `Weighted_Component = (Avg_ON - Avg_OFF)_in_bin * Weight_of_bin`.
  3. `Weighted Average Difference = Sum of all Weighted_Components / Sum of Weights for these bins`.
  This provides a more realistic impact estimate reflecting typical operating conditions.
- Uptime Correction (i)Adjusts the calculated impact based on the actual percentage of time the control system was active within the analyzed 'ON' periods.: The calculated differences (simple and weighted) reflect the impact during the hours the control was ON. If the control strategy had an uptime of, say, 80% within the "ON" periods you're analyzing, the "Uptime Corrected" values estimate the impact scaled to that uptime. E.g., `Uptime_Corrected_Difference = Calculated_Difference * (Uptime_Percentage / 100)`. This helps understand the overall impact considering actual operational uptime.
- Affinity Law Correction (for Fan/Pump Pressure signals) (i)Estimates power change from pressure change using Power ∝ Pressure^1.5.: If analyzing a pressure signal for fans or pumps (e.g., "AHU Pressure"), a direct pressure difference doesn't linearly translate to energy savings. The Affinity Law is used: `Power_Change_Percent ≈ ((1 + Pressure_Change_Percent/100)^1.5 - 1) * 100`. This estimates the percentage change in fan/pump power.
  Example: A 10% reduction in pressure (`-10%`) leads to: `((1 - 0.1)^1.5 - 1)*100% ≈ ((0.9)^1.5 - 1)*100% ≈ (0.853 - 1)*100% ≈ -14.7%` estimated power reduction.
Day-Specific Analysis: The above analysis is often repeated for "All Days", "Weekdays", "Weekends", and even individual days of the week to see if the control strategy's impact varies.

Use Case for Energy Engineers: This is the workhorse analysis to quantify control strategy effectiveness. It helps answer: "How much does my AI/control change signal X when it's ON, compared to when it's OFF, under similar outdoor conditions?" Is it reducing heating supply temps? Increasing AHU pressure? By how much?