NicheExplorer API (1.0.0)

Download OpenAPI specification:Download

License: MIT

Unified API specification for NicheExplorer microservices. Provides topic discovery, article fetching, and AI-powered analysis.

Analysis

Launch analysis job

Main orchestration endpoint that coordinates classification, article fetching, embedding generation and topic discovery across micro-services. Returns 202 Accepted immediately so the client can poll for completion.

Request Body schema: application/json

query required	string User's research question
auto_detect	boolean Default: true Auto-detect the best source when `source` is omitted.
max_articles	integer >= 1 Default: 50 Cap on number of articles processed.
nr_topics	integer >= 1 Default: 5 The desired number of topics to be generated.
min_cluster_size	integer >= 1 Default: 2 Minimum articles per topic.
source	string Enum: "arxiv" "reddit"
category	string Manual category/feed override.

Responses

Request samples

Payload

Content type

application/json

{"query": "string",
"auto_detect": true,
"max_articles": 50,
"nr_topics": 5,
"min_cluster_size": 2,
"source": "arxiv",
"category": "string"
}

Response samples

400
500

Content type

application/json

{"code": "string",
"message": "string",
"details": { }
}

List analyses (paginated)

Returns previously submitted analyses ordered by creation date. Does not include nested topic/article data to keep payload small.

query Parameters

limit	integer Default: 20 Maximum number of items to return.
offset	integer Default: 0 Offset into the result-set for pagination.

Responses

Response samples

200
500

Content type

application/json

{"total": 0,
"limit": 0,
"offset": 0,
"items": [{"id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
"status": "PENDING",
"query": "string",
"type": "research",
"feed_url": "http://example.com",
"total_articles_processed": 0,
"created_at": "2019-08-24T14:15:22Z",
"topics": [{"id": "string",
"title": "string",
"description": "string",
"article_count": 0,
"relevance": 100,
"articles": [{"id": "string",
"title": "string",
"link": "http://example.com",
"summary": "string",
"authors": ["string"
],
"published": "2019-08-24T14:15:22Z",
"source": "arxiv",
"metadata": {"arxiv_id": "2301.12345",
"categories": ["cs.AI",
"cs.LG"
],
"subreddit": "MachineLearning",
"score": 42,
"num_comments": 15
}
}
]
}
]
}
]
}

Poll analysis status

Returns 200 with full result. The status field indicates whether the analysis is complete, in-progress or has failed.

path Parameters

id

required

string <uuid>

Responses

Response samples

200
404
500

Content type

application/json

{"id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
"status": "PENDING",
"query": "string",
"type": "research",
"feed_url": "http://example.com",
"total_articles_processed": 0,
"created_at": "2019-08-24T14:15:22Z",
"topics": [{"id": "string",
"title": "string",
"description": "string",
"article_count": 0,
"relevance": 100,
"articles": [{"id": "string",
"title": "string",
"link": "http://example.com",
"summary": "string",
"authors": ["string"
],
"published": "2019-08-24T14:15:22Z",
"source": "arxiv",
"metadata": {"arxiv_id": "2301.12345",
"categories": ["cs.AI",
"cs.LG"
],
"subreddit": "MachineLearning",
"score": 42,
"num_comments": 15
}
}
]
}
]
}

Delete analysis

Permanently remove analysis and all related data.

path Parameters

id

required

string <uuid>

Responses

Response samples

401
403
404

Content type

application/json

{"code": "string",
"message": "string",
"details": { }
}

AI

Classify query intent (research vs community)

Uses a Large Language Model (LLM) to decide whether the user's query should be answered with peer-reviewed research articles (arXiv) or community-sourced discussions (Reddit). Returns the suggested source and category along with a confidence score.

Request Body schema: application/json

query

required

string

User's natural language query

Responses

Request samples

Payload

Content type

application/json

{"query": "string"
}

Response samples

200
400
500

Content type

application/json

{"source": "arxiv",
"source_type": "research",
"suggested_category": "string",
"confidence": 1
}

Generate embeddings

Generate and cache embeddings for multiple texts

Request Body schema: application/json

texts required	Array of strings Texts to generate embeddings for
ids required	Array of strings Document IDs for caching

Responses

Request samples

Payload

Content type

application/json

{"texts": ["string"
],
"ids": ["string"
]
}

Response samples

200
400
500

Content type

application/json

{"embeddings": [[0
]
],
"cached_count": 0,
"found_count": 0
}

Retrieve cached embeddings

Get previously generated embeddings by document IDs

query Parameters

ids

required

Array of strings

Document IDs to retrieve embeddings for

Responses

Response samples

200
400
500

Content type

application/json

{"embeddings": [[0
]
],
"cached_count": 0,
"found_count": 0
}

Build source-specific query

Generate optimized query for a specific data source using AI

path Parameters

source

required

string

Enum: "arxiv" "reddit"

Target data source

Request Body schema: application/json

search_terms required	string Natural language search terms
	object Source-specific filters

Responses

Request samples

Payload

Content type

application/json

{"search_terms": "string",
"filters": {"category": "string",
"subreddit": "string",
"timeframe": "past_hour",
"author": "string",
"language": "string"
}
}

Response samples

200
400
500

Content type

application/json

{"query": "string",
"description": "string",
"source": "string"
}

Generate text using a large language model

Takes a prompt and returns a generated text from a specified Large Language Model (e.g., Gemini, OpenRouter). This can be used for various generative tasks like summarizing topics, creating labels, or answering questions based on provided context. The prompt can contain placeholders like [DOCUMENTS] and [KEYWORDS] which should be replaced by the caller.

Request Body schema: application/json

prompt required	string The prompt to send to the LLM. Can include placeholders like [DOCUMENTS] and [KEYWORDS].
model	string The model to use for generation. Defaults to the service's configured model.
max_tokens	integer Default: 256 Maximum number of tokens to generate.
temperature	number <float> Default: 0.7 Controls randomness. Lower is more deterministic.

Responses

Request samples

Payload

Content type

application/json

{"prompt": "string",
"model": "gemini-pro",
"max_tokens": 256,
"temperature": 0.7
}

Response samples

200
400
500

Content type

application/json

{"text": "string",
"model": "string",
"prompt": "string"
}

Articles

Fetch articles

Retrieve articles from research or community sources

Request Body schema: application/json

query required	string Search query (source-specific syntax)
limit	integer Default: 50 Maximum articles to fetch
source required	string Enum: "arxiv" "reddit" Specific source to fetch from
category	string Source-specific category (e.g., cs.AI or MachineLearning)
	object Additional filters understood by the target source

Responses

Request samples

Payload

Content type

application/json

{"query": "string",
"limit": 50,
"source": "arxiv",
"category": "string",
"filters": { }
}

Response samples

200
400
500

Content type

application/json

{"articles": [{"id": "string",
"title": "string",
"link": "http://example.com",
"summary": "string",
"authors": ["string"
],
"published": "2019-08-24T14:15:22Z",
"source": "arxiv",
"metadata": {"arxiv_id": "2301.12345",
"categories": ["cs.AI",
"cs.LG"
],
"subreddit": "MachineLearning",
"score": 42,
"num_comments": 15
}
}
],
"total_found": 0,
"source": "arxiv"
}

Get source categories

Get available categories for a specific data source

path Parameters

source

required

string

Enum: "arxiv" "reddit"

Data source

Responses

Response samples

200
500

Content type

application/json

{"property1": ["string"
],
"property2": ["string"
]
}

Topics

Discover topics

Cluster articles into coherent topics using embeddings

Request Body schema: application/json

query required	string Original user query
article_ids required	Array of strings Article IDs with cached embeddings
required	Array of objects (Article) Article metadata
min_cluster_size	integer Default: 2 Minimum articles per topic
nr_topics	integer >= 1 Default: 5 Maximum number of topics to generate.

Responses

Request samples

Payload

Content type

application/json

{"query": "string",
"article_ids": ["string"
],
"articles": [{"id": "string",
"title": "string",
"link": "http://example.com",
"summary": "string",
"authors": ["string"
],
"published": "2019-08-24T14:15:22Z",
"source": "arxiv",
"metadata": {"arxiv_id": "2301.12345",
"categories": ["cs.AI",
"cs.LG"
],
"subreddit": "MachineLearning",
"score": 42,
"num_comments": 15
}
}
],
"min_cluster_size": 2,
"nr_topics": 5
}

Response samples

200
400
500

Content type

application/json

{"query": "string",
"topics": [{"id": "string",
"title": "string",
"description": "string",
"article_count": 0,
"relevance": 100,
"articles": [{"id": "string",
"title": "string",
"link": "http://example.com",
"summary": "string",
"authors": ["string"
],
"published": "2019-08-24T14:15:22Z",
"source": "arxiv",
"metadata": {"arxiv_id": "2301.12345",
"categories": ["cs.AI",
"cs.LG"
],
"subreddit": "MachineLearning",
"score": 42,
"num_comments": 15
}
}
]
}
],
"total_articles_processed": 0
}