An on-device text summarization app built with React Native and Expo. This app demonstrates how to run Large Language Models (LLMs) entirely on-device, providing zero-cost AI functionality with complete privacy—your data never leaves your device.
- On-device text summarization using quantized LLM models
- Privacy-first: All processing happens locally, no cloud API calls
- Zero cost: No per-token charges or server infrastructure needed
- Offline capable: Works without internet connection
- Multi-model support: Choose from different quantized models optimized for mobile
- Smart chunking: Automatically handles long documents by processing them in chunks
- Cross-platform: Runs on both iOS and Android
- Node.js (v18 or later)
- npm or yarn
- For iOS development:
- macOS
- Xcode (latest version)
- CocoaPods (
sudo gem install cocoapods)
- For Android development:
- Android Studio
- Android SDK
- Java Development Kit (JDK)
git clone https://github.com/nearform/ai-beyond-the-cloud.git
cd ai-beyond-the-cloudnpm installThis app uses native modules (react-native-executorch) and requires a development build. Expo Go is not supported.
For iOS:
npm run iosThis will build the development build and launch it on the iOS simulator or connected device.
For Android:
npm run androidThis will build the development build and launch it on the Android emulator or connected device.
Note:
- The first build may take a while as it compiles native code
- Subsequent runs will be faster with incremental builds
- Enter text: Paste or type the text you want to summarize in the input field
- Select model (optional): Tap the model selector to choose a different quantized model
- Generate summary: Tap the "Summarize" button
- View results: The summary will appear below the input field
- Clear: Tap "Clear" to reset the input and summary
The app automatically handles long texts by:
- Chunking text into manageable pieces
- Processing chunks sequentially with delays to prevent thermal throttling
- Combining chunk summaries into a final result
├── app/
│ ├── _layout.tsx # App layout and navigation
│ └── index.tsx # Main screen component
├── components/ # Reusable UI components
├── utils/
│ ├── generation-core.ts # Core LLM generation logic
│ ├── generation-service.ts # Generation session management
│ ├── model-manager.ts # Model lifecycle management
│ ├── model-registry.ts # Available models configuration
│ ├── summarizer.ts # Text chunking utilities
│ └── use-model.ts # React hook for model access
├── __tests__/ # Test suites
└── assets/ # Images and static assets
npm start- Start the Expo development server (required for development builds)npm run ios- Build and run on iOS simulator/devicenpm run android- Build and run on Android emulator/devicenpm run web- Run in web browser (limited functionality - on-device LLM not available)npm test- Run test suitenpm run test:watch- Run tests in watch modenpm run lint- Run ESLint
The project includes comprehensive test coverage:
npm testTests cover:
- UI component interactions
- Text chunking logic
- Core generation functionality
- Model state management
- Error handling
The app uses react-native-executorch to run quantized PyTorch models on-device. Models are optimized for mobile with:
- INT4/INT8 quantization for reduced size and power consumption
- ExecuTorch runtime for efficient inference
- Smart memory management to prevent OOM crashes
- Model Manager: Handles model loading, initialization, and lifecycle
- Generation Service: Manages generation sessions and prevents race conditions
- Generation Core: Core LLM inference logic with cooldown and locking mechanisms
- React Hooks:
useLLMModelprovides reactive access to model state
- Text is chunked to respect model context windows
- Delays between chunk processing prevent thermal throttling
- Input truncation prevents memory issues
- Generation cooldowns prevent rapid-fire requests
- Ensure you have sufficient device storage
- Check that the model files are properly bundled
- Try restarting the app
- Use a smaller model for faster inference
- Reduce input text length
- Close other apps to free up memory
- The app limits input size and chunk count to prevent OOM
- If crashes persist, try a device with more RAM or a smaller model
- Check device logs for specific error messages
For a detailed technical deep-dive into on-device AI, quantization, and the architecture decisions behind this app, see BLOG.md.
This project is private and proprietary.
This is a demonstration project. For questions or issues, please open an issue on GitHub.