Welcome to pytrojmiastopl’s documentation!¶
Introduction¶
pytrojmiastopl supplies two methods to scrape data from www.ogloszenia.trojmiasto.pl website
Scraping category data¶
This method scrapes available offer urls from trojmiasto.pl search results with parameters
-
trojmiastopl.category.
get_category
(category, region=None, **filters)[source]¶ Parses available offer urls from given category from every page
Parameters: - category – Search category
- region – Search region
- filters – Dictionary with additional filters. Following example dictionary contains every possible filter
with examples of it’s values.
Example: - input_dict = {
“offer_type”: “Mieszkanie”, # offer type. See :meth:`utils.decode_type’ for reference “cena[]”: (300, None), # price (from, to). None if you don’t want to pass one of arguments “kaucja[]: (100,1000), # deposit “cena_za_m2[]”: (5, 100), # price/surface “powierzchnia[]”: (23, 300), # surface “l_pokoi[]”: (2, 5), # desired number of rooms “pietro[]”: (-1, 6), # desired floor, enum: from 1 to 49 and -1 (ground floor) “l_pieter[]”: (1, 10), # desired total number of floors in building “rok_budowy[]”: (2003, 2017), # date of built “data_wprow”: “1d” # date of adding offer. Available: 1d - today, 3d - 3 days ago, 1w - one week ago,
# 3w - 3 weeks ago
}
Returns: List of all offers for given parameters Return type: list
It can be used like this:
input_dict = {"cena[]": (300, None)}
parsed_urls = trojmiastopl.category.get_category("nieruchomosci-mam-do-wynajecia", "Gdańsk", **input_dict)
The above code will put a list of urls containing all the apartments found in the given category into the parsed_url variable
Scraping offer data¶
This method scrapes all offer details from
It can be used like this:
descriptions = trojmiastopl.offer.get_descriptions(parsed_urls)
The above code will put a list of offer details for each offer url provided in parsed_urls into the descriptions variable
Category methods¶
-
trojmiastopl.category.
get_category
(category, region=None, **filters)[source]¶ Parses available offer urls from given category from every page
Parameters: - category – Search category
- region – Search region
- filters – Dictionary with additional filters. Following example dictionary contains every possible filter
with examples of it’s values.
Example: - input_dict = {
“offer_type”: “Mieszkanie”, # offer type. See :meth:`utils.decode_type’ for reference “cena[]”: (300, None), # price (from, to). None if you don’t want to pass one of arguments “kaucja[]: (100,1000), # deposit “cena_za_m2[]”: (5, 100), # price/surface “powierzchnia[]”: (23, 300), # surface “l_pokoi[]”: (2, 5), # desired number of rooms “pietro[]”: (-1, 6), # desired floor, enum: from 1 to 49 and -1 (ground floor) “l_pieter[]”: (1, 10), # desired total number of floors in building “rok_budowy[]”: (2003, 2017), # date of built “data_wprow”: “1d” # date of adding offer. Available: 1d - today, 3d - 3 days ago, 1w - one week ago,
# 3w - 3 weeks ago
}
Returns: List of all offers for given parameters Return type: list
-
trojmiastopl.category.
get_offers_for_page
(category, region, page, **filters)[source]¶ Parses offers for one specific page of given category with filters.
Parameters: - category (str) – Search category
- region (str) – Search region
- page (int) – Page number
- filters (dict) – See :meth category.get_category for reference
Returns: List of all offers for given page and parameters
Return type: list
-
trojmiastopl.category.
get_page_count
(markup)[source]¶ Reads total page number from trojmiasto.pl search page
Parameters: markup (str) – trojmiasto.pl search page markup Returns: Total page number Return type: int Except: If no page number was found - there is just one page.
-
trojmiastopl.category.
get_page_count_for_filters
(category, region=None, **filters)[source]¶ Reads total page number for given search filters
Parameters: - category (str) – Search category
- region (str) – Search region
- filters (dict) – See :meth category.get_category for reference
Returns: Total page number
Return type: int
Except: If no page number was found - there is just one page.
Offer methods¶
-
trojmiastopl.offer.
get_additional_information
(offer_markup)[source]¶ Searches for additional info and heating type
Parameters: offer_markup (str) – Class “sidebar” from offer page markup Returns: Additional info with optional heating type Return type: dict
-
trojmiastopl.offer.
get_apartment_type
(offer_markup)[source]¶ Searches for apartment type in offer markup
Parameters: offer_markup (str) – Class “sidebar” from offer page markup Returns: Apartment type Return type: str
-
trojmiastopl.offer.
get_available_from
(offer_markup)[source]¶ Searches for available from in offer markup
Parameters: offer_markup (str) – Class “sidebar” from offer page markup Returns: Available from or None if there is no information Return type: str, None
-
trojmiastopl.offer.
get_furnished
(offer_markup)[source]¶ Searches if offer is marked as furnished or not
Parameters: offer_markup (str) – Class “sidebar” from offer page markup Returns: Information is offer furnished Return type: bool Except: If there is no information if offer is furnished it will return None
-
trojmiastopl.offer.
get_img_url
(offer_markup)[source]¶ Searches for images in offer markup
Parameters: offer_markup (str) – Id “gallery” from offer page markup Returns: Images of offer in list Return type: list
-
trojmiastopl.offer.
get_month_num_for_string
(value)[source]¶ Map for polish month names
Parameters: value (str) – Month value Returns: Month number Return type: int
-
trojmiastopl.offer.
get_surface
(offer_markup)[source]¶ Searches for surface in offer markup
Parameters: offer_markup (str) – Class “sidebar” from offer page markup Returns: Surface or None if there is no surface Return type: float, None Except: When there is no offer surface it will return None
-
trojmiastopl.offer.
get_title
(offer_markup)[source]¶ Searches for offer title on offer page
Parameters: offer_markup (str) – Class “title-wrap” from offer page markup Returns: Title of offer or None if there is no title Return type: str, None Except: Returns None when couldn’t find title of offer page.
-
trojmiastopl.offer.
parse_date_to_timestamp
(date)[source]¶ Parses string date to unix timestamp
Parameters: date (str) – Date Returns: Unix timestamp Return type: int
-
trojmiastopl.offer.
parse_dates_and_id
(offer_markup)[source]¶ Searches for date of creating and date of last update of an offer. Additionally parses offer id number.
Parameters: offer_markup (str) – Class “sidebar” from offer page markup Returns: Date added and date updated if found and offer id (id, added, updated) Return type: dict
-
trojmiastopl.offer.
parse_description
(description_markup)[source]¶ Searches for offer description
Parameters: description_markup (str) – Class “ogl-description” from offer page markup Returns: Offer description Return type: str
-
trojmiastopl.offer.
parse_flat_data
(offer_markup)[source]¶ Parses flat data from sidebar
Parameters: offer_markup (str) – Class “sidebar” from offer page markup Returns: Information about price, deposit, floor, number of rooms, date of built and total count of floors in building :rtype: dict
-
trojmiastopl.offer.
parse_offer
(url)[source]¶ Parses data from offer page url
Parameters: url (str) – Url of current offer page Returns: Dictionary with all offer details Return type: dict Except: If there is no offer title anymore - offer got deleted.
Utils methods¶
-
trojmiastopl.utils.
decode_category_name
(category)[source]¶ Decodes category name to it’s value
Parameters: category (str) – Category name Returns: Category number Return type: int
-
trojmiastopl.utils.
decode_type
(filter_value)[source]¶ Decodes offer type name to it’s value
List of available options and it’s translation can be found bellow.
Parameters: filter_value (str) – One of available type names Returns: Int value for POST variable Return type: int
-
trojmiastopl.utils.
get_content_for_url
(url)[source]¶ Connects with given url
If environmental variable DEBUG is True it will cache response for url in /var/temp directory
Parameters: url (str) – Website url Returns: Response for requested url
-
trojmiastopl.utils.
get_url
(category, region=None, **filters)[source]¶ Creates url for given parameters
Parameters: - category (str) – Search category
- region (str) – Search region
- filters (dict) – Dictionary with additional filters. See :meth:’trojmiastopl.get_category’ for reference
Returns: Url for given parameters
Return type: str